Colson, Jean-Pierre
[UCL]
Phraseology has often been criticized for its lack of terminological consistency and for its very diverse approaches, as well as for its weak theoretical underpinnings. For all these reasons, computational phraseology has an important role to play as the interface between corpus linguistics and other approaches to morpho-syntax or semantics. In this contribution, some of the practical and theoretical aspects of this debate are illustrated by means of the French recurrent pattern en tout (tous, toute toutes) and its associated phraseological units (PUs). In traditional dictionaries, PUs of this type are usually absent or poorly described. An experiment with two corpora (one year of a newspaper on the one hand and the 10-billion corpus of the Sketch Engine on the other) confirms that huge collections of texts are necessary in order to describe recurrent patterns of this type. From the point of view of computational phraseology, it turns out that experiments with frequency may be useful for such PUs, but only provided that some syntactic or semantic information is added to the frequency criterion. When used on its own, frequency turns out to yield irrelevant results, which is another indication that frequency and fixedness are quite different linguistic phenomena. We therefore suggest the use of the cpr-score (Corpus Proximity Ratio, Colson 2016) as a tentative step towards establishing the degree of attraction between the component grams of an n-gram, which makes it possible to extract most of the contiguous sequences in an automated way. While opening new possibilities for the practical description of phraseology, this methodology also poses the theoretical question of the role of statistical association with regards to morpho-syntax. Additional experiments on the basis of morphemes indicate that traditional corpus linguistics based on words may be insufficient for tracing back the natural associations of linguistic meanings expressed by morphemes. Alternatively, experiments with morpheme-based patterns can take into account the diversity of languages, while at the same time providing supporting evidence for a third articulation of language, as described by Mejri (2006).
Bibliographic reference |
Colson, Jean-Pierre. Les traces du figement dans les corpus : une étude de cas. In: Français Moderne : revue de linguistique française, Vol. 86, no.1, p. 129-145 (2018) |
Permanent URL |
http://hdl.handle.net/2078.1/199427 |