Crible, Ludivine
[UCL]
Cuenca, Maria Josep
[Universitat de València]
Among the vast literature on discourse markers (henceforth DMs) and discourse-relational devices in general, one aspect of their behaviour has been somewhat overlooked, namely their co-occurrence. It is frequently the case that two or more DMs co-occur, as in the case of so for instance if, where DMs only co-occur or are juxtaposed, or in the case of but actually, and yet, but look, etc., where they actually combine. DM co-occurrence is a multi-faceted phenomenon, since not all cases display the same degree of integration: most authors distinguish between at least two types of co-occurrence, namely addition vs. composition, depending on a number of syntactic and functional criteria (see, e.g., Luscher 1993; Hansen 1998; Pons 2008, in press; Cuenca & Marín 2009). Discourse analysis and corpus annotation show that this phenomenon is quite pervasive: 20% of all occurrences are coded as part of a co-occurring string in Crible’s (2017) corpus study of spoken English and French. In fact, DM co-occurrence poses a challenge for corpus annotation since i) it is not always clear whether two co-occurring DMs remain independent from each other or whether they should be considered as one token, and ii) senses can be influenced by co-occurring DMs during disambiguation. This study sets out to provide clear criteria for different degrees of co-occurrence on the basis of corpus-based examples. Previous papers on the subject propose several criteria to distinguish different degrees of integration. Luscher (1993) uses syntactic and semantic scope to distinguish between “additive” and “compositional” sequences. He defines the latter as applying to two adjacent DMs which are semantically similar (e.g. French mais pourtant ‘but however’), one of them being more restricted or specific in its meaning than the other. This latter type is the focus of Fraser’s (2013) study targeting English contrastive connectives. Hansen’s (1998) distinction between summative and combinatory sequences adopts a different perspective and depends on whether the elements in the sequence retain their individual meaning (French ah bon ‘oh really’) or form a new complex one (eh bien ‘well’). She argues that most DM sequences are summative (or compositional), since it is always possible to reconstruct the meaning of each element. Similarly, Pons (2008) concludes from his analysis of the co-occurrences of the Spanish modal marker bueno with other discourse markers that discourse segmentation of oral discourse allows to differentiate two different configurations: the cases in which the two markers are simply adjacent from the cases in with they combine according to whether they apply to different or to a unique structural unit. More recently, Dostie (2013) and Crible (2015) consider other types of cues in DM use that provide evidence for stronger degrees of combination, such as phonological reduction (eh bien to eh ben), new spellings (ou sinon ‘or else’ to aussi non) and new contexts of use (initial to final position for ou sinon). Cuenca & Marín (2009) discuss and illustrate a three-fold distinction in a corpus of spoken Spanish and Catalan, namely: • juxtaposition, when the DMs do not combine syntactically nor semantically (typically two conjunctions); • addition, when the DMs combine locally but their functions remain distinct (typically conjunctions followed by parenthetical connectives that jointly connect at a local level); • composition, when the DMs function as one unit (typically two parenthetical connective units with a single global-level function). Their analysis is very fine-grained and identifies recurrent formal and functional tendencies for each of these levels. Crible (2017) attempted to apply Cuenca & Marín’s (2009) classification through systematic annotation and was confronted with problematic, borderline cases (e.g. and so or et alors ‘and then’) which raised concerns about some features, pointing especially at the fuzzy border between addition and composition. Crible also discusses the role of frequency in the definition of these levels, and suggests an additional degree to deal with cases of “reinforcement” (e.g. but in fact). Her study draws the attention to the consequences of an adequate treatment of DM co-occurrence for corpus annotation (token identification and sense disambiguation). Similarly, in the guidelines of the Penn Discourse TreeBank 2.0, Prasad et al. (2007) mention that multiple (i.e. co-occurring) connectives should ideally be annotated as such and differentiated according to the (in)dependence of their elements in order to improve predictive features and classifiers. The purpose of this study is to revisit Cuenca & Marín’s (2009) three-fold classification and refine the criteria to distinguish each degree of co-occurrence, in order to be able to apply them systematically to corpus data. To this end, we used a sample of English conversational data from the DisFrEn dataset where DMs were already identified (Crible 2017): 71 DM clusters were thus extracted, from a total of 17,479 words (about 90 minutes of recordings). We did not consider phrasal DMs (e.g. I mean) as co-occurring. We further excluded cases where two DMs belong to different units (final position of the first unit, initial position of the second one, as in I like winter actually but I prefer spring). For each cluster, we manually encoded the following features: number of elements in the cluster, syntactic category (based on Cuenca 2013: conjunction, parenthetical connective, pragmatic connective, interjection), scope (same or different), position (initial or medial). We then discussed whether the elements of the cluster expressed the same meaning (or function) or not, and what degree of co-occurrence they represented. Thanks to this qualitative analysis, we were able to distinguish between criteria (necessary conditions) and features (quantitative tendencies): we found that considerations of scope and of function are criterial in the definition of the levels, whereas prosody (i.e. contiguous pause) and syntactic categories are mere tendencies. As a result, the revised cline of co-occurrence is the following: • juxtaposition, when the DMs take scope over different units (mostly when a coordinating conjunction and a subordinating conjunction co-occur); • addition, when the DMs have the same scope but clearly distinct meanings; • composition, when the DMs have the same scope and the same overall meaning, but one of them is more specific than the other (reinforcement effect); • lexicalization: a new meaning arises from the co-occurrence which is not the sum of its parts, and removing one element changes the meaning of the cluster (often with semantic bleaching and phonological reduction). This proposal takes into account the dynamicity of language and phenomena such as layering and stratification, related to polyfunctionality and underspecification. For instance, in English the highly frequent cluster and then instantiates different configurations and degrees of integration depending on the semantic status of the temporal adverbial. The first (and most frequent) use of and then (1) is an addition of the additive conjunction and the temporal adverb. In another related use (2), the elements add to express consequence, a meaning which can be derived – but differs – from the temporal meaning of then (‘at that time’). Lastly, and then (3) can express one global function of continuity or enumeration at discourse level (i.e. not temporality between facts) with contrastive nuances, in which case the co-occurrence is somewhere in-between the space of composition and lexicalization, since the meaning of the cluster is not (strictly) the sum of its parts. (1) they buy the book say for a couple of pounds (1.420) and then return it and get half (2) I've got people coming I'll get some salmon from the stall and when you get down there you find he hasn't actually got any and then it throws you into a complete quandary (3) people do tend to describe themselves […] a lot of people describe people as jealous […] and then there are the really bland ones It can be concluded that a single co-occurrence and then can instantiate different categorical configurations and can also vary along the cline of co-occurrence, thus advocating for a flexible, context-bound approach to the issue in future annotation endeavours. These distinctions are subtle and highly context-bound, yet they can and should be systematically accounted for, especially since and then is also quite frequent in writing (cf. but then or so for instance, mentioned in the PDTB guidelines). Additional features (e.g. prosody, length and type of host unit) can be investigated to further support this flexible portrait of and then. To conclude, in line with Crible & Cuenca (2017), we suggest that DM annotation endeavours should consider including information about co-occurrence, minimally by identifying clusters, ideally by distinguishing between degrees of integration following the criteria that we have developed in this study. This is particularly crucial for sequences such as and then (and its cross-linguistic equivalents, e.g. French et puis), which do not display a unique functional profile depending on co-occurrence degree. Our criteria and analysis paves the way for fruitful comparisons across spoken and written languages.


Bibliographic reference |
Crible, Ludivine ; Cuenca, Maria Josep. Co-occurrence of discourse markers: from juxtaposition to composition.Cross-linguistic Discourse Annotation: Applications and Perspectives (TextLink2018) (Toulouse, France, du 19/03/2018 au 21/03/2018). |
Permanent URL |
http://hdl.handle.net/2078.1/193573 |