Onclinx, Victor
[UCL]
Because of the increasing facility to collect and to store large amounts of features, industrial and research fields have more and more to deal with high-dimensional data. Unfortunately, this kind of data is offset by a lack of interpretability. Before quantitative analysis, acquiring prior knowledge is of primary importance in guiding the choice of data analysis models. In some sense, the data set contains information that cannot be easily identified in a high-dimensional space. Parts of this information are, for instance, the presence of clusters or the proximity relationships between data. To render the visualization of these specificities possible, dimensionality reduction techniques embed data in a low-dimensional space. They try to preserve the pertinent information included in the data set. This work shows how to preserve this pertinent information. Our contribution is twofold: it studies the representation of data on a manifold and the preservation of an ordering relationship. First, if we assume that the data set is close to a manifold, this manifold contains part of the information. When the data set contains loops, one should immerse the data on a manifold such as the sphere. The second contribution studies the preservation of an ordering relationship, e.g. of ranks. Ranks are computed by sorting the rows or the columns of the distance matrices. The preservation of ranks is motivated by recent quality criteria that evaluate the quality of the representations by counting points that are close to each other both in the original and in the representation space. For this purpose, they compare the rank matrices.
Bibliographic reference |
Onclinx, Victor. Dimensionality reduction for visualization : representing on manifolds and preserving ranks. Prom. : Verleysen, Michel ; Wertz, Vincent |
Permanent URL |
http://hdl.handle.net/2078.1/160847 |