Van den Broeck, Martial
[UCL]
Macq, Benoît
[UCL]
A common assumption in machine learning is to suppose that the training and the testing distribution are the same. Unfortunately, this assumption is not always true, especially in computer vision: one of the biggest challenges that visual recognition systems must face in the real world is the ability to have good performances on visually different data samples. The interest in this problem is growing in the computer vision community, and an increasing number of data scientists focus on that to develop models that can generalize well to unseen distribution. In this project, we investigated how the natural language description of the visual domains could be helpful to improve the generalization ability of a recognition model. In particular, we propose a metric learning based approach to learn a common embedding over the images and text such that nearby image and phrase pairs in the embedding space are related. Then we use the learned representation to improve the prediction on the testing data based on its similarity with the training distribution. We obtain encouraging results on the PACS dataset showing how a multi-modal learning based on natural language can improve the generalization ability of a model.
Bibliographic reference |
Van den Broeck, Martial. Domain-To-Text : improving domain generalization using natural language. Ecole polytechnique de Louvain, Université catholique de Louvain, 2023. Prom. : Macq, Benoît. |
Permanent URL |
http://hdl.handle.net/2078.1/thesis:38752 |