Marion, Rebecca
[UCL]
Govaerts, Bernadette
[UCL]
von Sachs, Rainer
[UCL]
This paper introduces a new method, Adaptive Clustering around Latent Variables (AdaCLV), for simultaneous dimensionality reduction and variable clustering, the partitioning of variables into groups. This unsupervised method is particularly well suited for the exploration of spectroscopic datasets, such as Nuclear Magnetic Resonance (NMR) spectra, and can be used for the identification of potential biomarkers. AdaCLV is inspired by existing multivariate methods from the Clustering around Latent Variables (CLV) family, but it offers several key advantages with respect to these methods. First, AdaCLV allows variables to belong to multiple clusters with varying degrees. A cluster membership degree is estimated for each variable and cluster, and these memberships are used to define non-orthogonal latent variables that summarize the clusters. As a result, the clusters and latent variables identified by AdaCLV are more interpretable and representative of spectroscopic data, where peaks for different molecules (i.e. variable clusters) may overlap and variables within a cluster have different degrees of importance. Second, while the performance of existing methods depends greatly on hyperparameter selection, AdaCLV is less sensitive to its hyperparameters, adapting to the clustering structure present in the data. This paper compares AdaCLV with existing CLV methods and other competitors in experiments involving real and semi-artificial NMR spectra. AdaCLV is shown to be more robust to hyperparameter choice and to have better precision than the other methods, for all cluster numbers, sample sizes and levels of signal tested, while achieving a comparable level of recall.
Bibliographic reference |
Marion, Rebecca ; Govaerts, Bernadette ; von Sachs, Rainer. AdaCLV for Interpretable Variable Clustering and Dimensionality Reduction of Spectroscopic Data. In: Chemometrics and Intelligent Laboratory Systems, Vol. 206 (2020) |
Permanent URL |
http://hdl.handle.net/2078.1/229602 |