SW4ALL: a CEFR Classified and Aligned Corpus for Language Learning

Souza Wilkens, Rodrigo; Zilio, Leonardo; Fairon, Cédrick

DIAL.pr - BOREAL

Accès à distance ? S'identifier sur le proxy UCLouvain

SW4ALL: a CEFR Classified and Aligned Corpus for Language Learning

Primary tabs

download

__1012.pdf

Open access
PDF
213.71 K

Souza Wilkens, Rodrigo [UCL]

Zilio, Leonardo [UCL]

Fairon, Cédrick [UCL]

Learning a second language is a task that requires a good amount of time and dedication. Part of the process involves the reading and writing of texts in the target language, and so, to facilitate this process, especially in terms of reading, teachers tend to search for texts that are associated to the interests and capabilities of the learners. But the search for this kind of text is also a time-consuming task. By focusing on this need for texts that are suited for different language learners, we present in this study the SW4ALL, a corpus with documents classified by language proficiency level (based on the CEFR recommendations) that allows the learner to observe ways of describing the same topic or content by using strategies from different proficiency levels. This corpus uses the alignments between the English Wikipedia and the Simple English Wikipedia for ensuring the use of similar content or topic in pairs of text, and an annotation of language levels for ensuring the difference of language proficiency level between them. Considering the size of the corpus, we used an automatic approach for the annotation, followed by an analysis to sort out annotation errors. SW4ALL contains 8.669 pairs of documents that present different levels of language proficiency.

metadata

Document type	Contribution à ouvrage collectif (Book Chapter) – Chapitre
Access type	Accès libre
Publication date	2018
Language	Anglais
Host document	Nicoletta Calzolari, Khalid Choukri, Christopher Cieri e.a. ; "Eleventh International Conference on Language Resources and Evaluation (LREC 2018)" (ISBN : 979-10-95546-00-9)
Peer reviewed	yes
Publisher	European language resources association (Paris)
Publication status	Publié
Affiliation	UCL - SSH/ILC/PLIN - Pôle de recherche en linguistique
Links	http://www.lrec-conf.org/proceedings/lrec2018/pdf/1012.pdf http://hdl.handle.net/2078.1/208085[Handle]

Bibliographic reference	Souza Wilkens, Rodrigo ; Zilio, Leonardo ; Fairon, Cédrick. SW4ALL: a CEFR Classified and Aligned Corpus for Language Learning. In: Nicoletta Calzolari, Khalid Choukri, Christopher Cieri e.a., Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European language resources association : Paris 2018
Permanent URL	http://hdl.handle.net/2078.1/208085

User menu

SW4ALL: a CEFR Classified and Aligned Corpus for Language Learning

Primary tabs

Footer Help

Languages

Footer menu

User menu

Search form

You are here

SW4ALL: a CEFR Classified and Aligned Corpus for Language Learning

Primary tabs

Footer Help

Languages

Footer menu