Tack, Anaïs
[UCL]
François, Thomas
[UCL]
Desmet, Piet
[KU Leuven]
Fairon, Cédrick
[UCL]
The recent years have seen the emergence of a number of graded lexical resources to further research on first and second language lexical complexity (Dürlich & François, 2018; François et al., 2014; François et al., 2016; Lété et al., 2004). These lexical resources describe the frequency distributions of lexemes graded along a particular learning or difficulty scale (f.i. the Common European Framework of Reference – CEFR scale) and are in particular corpus-based, machine-readable and open-licensed. Indeed, the lexical frequencies included in these resources are commonly estimated on a corpus of L1 and L2 learning materials, either including textbooks and simplified readers (receptive lexicons) or learner texts (productive lexicons) and they can hence easily be used for pedagogical purposes. Furthermore, the resources are not only available via an online query engine for teachers and/or researchers, but can also be used as components of a readability-driven learning platform (Pilán, Volodina, & Borin, 2016) or an automated essay grading system (Pilán, Volodina, & Zesch, 2016). Until now, these CEFR-graded lexical resources have only been made available for a few of the European languages, including French, Swedish and English. The rationale of our current research is to expand upon these previous developments with NT2Lex, a new resource for Dutch as a foreign language. Moreover, we also aim to leverage one of the shortcomings of the previously developed resources. Indeed, although these graded lexicons are a step towards rendering the inherent complexity of words more apparent – contrary to traditional frequency-based lexicons – we argue that they still lack information about word sense complexity since they include lexical entries disambiguated per lemma and part of speech. Our principal aim is therefore to advance a new type of graded lexicon: a word-sense disambiguated (WSD) graded lexicon linked to Open Dutch WordNet (Postma et al., 2016). The objectives of our study will be twofold. First, we will present the final version of the NT2Lex resource, which includes single and multi-word lexical entries part-of-speech tagged with ‘Frog’ (van den Bosch et al., 2007) and word-sense disambiguated with a Dutch WSD tool. Second, we will present work in progress on the use of word sense complexity features obtained with NT2Lex in L2 readability research, by comparing them to more traditional indices of lexical complexity such as lexical sophistication and hypernymy (Crossley & Salsbury, 2010).


Bibliographic reference |
Tack, Anaïs ; François, Thomas ; Desmet, Piet ; Fairon, Cédrick. Making Sense of L2 Lexical Complexity with NT2Lex, a CEFR-graded Lexicon Linked to Open Dutch WordNet.The XIXth International Computer Assisted Language Learning (CALL) Research Conference (Bruges, du 04/07/2018 au 06/07/2018). |
Permanent URL |
http://hdl.handle.net/2078.1/199110 |