Struys, Axel
[UCL]
Pircalabelu, Eugen
[UCL]
Many unsupervised clustering algorithms exist, allowing one to cluster the input data into several groups of individuals. In this thesis, the input data is restricted to be categorical only. However, it is often difficult to recover the decision rules used by a clustering algorithm to construct the clusters’ label. Indeed, as the clustering algorithms usually cluster the data based on the distance between the individuals, there’s usually not a clear relationship between the modalities of the input categorical variables and the clusters’ label. While one can always compute association metrics between the clusters’ label and the input variables, those can be difficult to understand for decision-makers. Thus, this thesis explores methods that cluster input data by constructing an unsupervised decision tree that allows one to visualize the decision rules used to cluster the data. The methods, namely Divclus-T and CUBT, are used to cluster socio-economical data of first year university students. The decision trees provided allows one to easily understand the groups of students that constitute a given faculty. The stability of those methods is assessed and compared with traditional hierarchical clustering, and it is shown that their stability is better or equal to the latter. Using three simulations studies, it is shown that both methods have better or equal performance than hierarchical clustering in finding the true partition.
Bibliographic reference |
Struys, Axel. Unsupervised decision trees: application on socio-economical data of university students. Faculté des sciences, Université catholique de Louvain, 2024. Prom. : Pircalabelu, Eugen. |
Permanent URL |
http://hdl.handle.net/2078.1/thesis:44066 |