Jamotton, Charlotte
[UCL]
Hainaut, Donatien
[UCL]
Hames, Thomas
The k-means algorithm and its variants are popular clustering techniques. Their purpose is to uncover group structures in a dataset. In actuarial applications, these partitioning methods detect clusters of policies with similar features and allow one to draw up a map of dominant risks. The main challenge lies in dening a distance between two observations exclusively characterised by categorical variables. This research paper starts with a review of the k-means algorithm and develops an extension based on Burt's framework to manage categorical rating factors. We then focus on a mini-batch version that keeps computation time under control when analysing a large-scale dataset. We next broaden the scope of application of the fuzzy k-means to fully categorised datasets. Lastly, we conclude with a thorough introduction to spectral clustering and work around the dimensionality issue by reducing the size of the initial dataset with k-means.
Bibliographic reference |
Jamotton, Charlotte ; Hainaut, Donatien ; Hames, Thomas. Insurance analytics with clustering techniques. LIDAM Discussion Paper ISBA ; 2023/02 (2023) 27 pages |
Permanent URL |
http://hdl.handle.net/2078.1/270714 |