Escouflaire, Louis
[UCL]
Blochowiak, Joanna
[UCL]
Degand, Liesbeth
[UCL]
de Marneffe, Marie-Catherine
[UCL]
We analyze how well human annotators, ChatGPT and a fine-tuned CamemBERT transformer model are able to predict words in French which differ subtly in meaning. We focus on car and parce que, two French connectives considered as near-synonymous, distinguished only by fine-grained syntactic, semantic and pragmatic features. We used a test set of 420 sentences from French news articles and SMS text messages containing the word car or parce que, which was masked and had to be predicted. Our results suggest that this task is particularly difficult both for native speakers of French and for large language models. However, we find that fine-tuning CamemBERT on a training corpus of 10,000 masked sentences containing car vs. parce que allows it to grasp the syntactic and semantic subtleties between the connectives, and to perform significantly better than the human annotators.


Bibliographic reference |
Escouflaire, Louis ; Blochowiak, Joanna ; Degand, Liesbeth ; de Marneffe, Marie-Catherine. Which connective fits best: ‘car’ or ‘parce que’? A challenge for both humans and LLMs.JADT 2024: 17th International Conference on Statistical Analysis of Textual Data (Brussels, Belgium, du 25/06/2024 au 27/06/2024). In: JADT 2024 Proceedings: 17th International Conference on Statistical Analysis of Textual Data, JADT : Brussels2024, p.Vol.1, 319-328 |
Permanent URL |
http://hdl.handle.net/2078.1/288770 |