Inadequacy of the chi-squared test to examine vocabulary differences between corpora

Bestgen, Yves

DIAL.pr - BOREAL

Accès à distance ? S'identifier sur le proxy UCLouvain

Inadequacy of the chi-squared test to examine vocabulary differences between corpora

Primary tabs

download

Chi2AndCorpusLinguistics - Pre-print

Open access
PDF
208.74 K

Bestgen, Yves [UCL]

Pearson's chi-squared test is probably the most popular statistical test used in corpus linguistics, particularly for studying linguistic variations between corpora. Oakes and Farrow (Literary and Linguistic Computing, 2007, 22, 85-99) proposed various adaptations of this test in order to allow for the simultaneous comparison of more than two corpora, while also yielding an almost correct Type I error rate (i.e. claiming that a word is most frequently found in a variety of English, when in actuality this is not the case). By means of resampling procedures, the present study shows that when used in this context, the chi-squared test produces far too many significant results, even in its modified version. Several potential approaches to circumventing this problem are discussed in the conclusion.

metadata

Document type	Article de périodique (Journal article)
Access type	Accès mixte
Publication date	2014
Language	Anglais
Journal information	"Literary and Linguistic Computing : the journal of digital scholarship in the humanities" - Vol. 29, no. 2, p. 164-170 (2014)
Peer reviewed	yes
Publisher	Oxford University Press (Oxford)
issn	0268-1145
e-issn	1477-4615
Publication status	Publié
Affiliation	UCL - SSH/IPSY - Psychological Sciences Research Institute
Links	https://doi.org/10.1093/llc/fqt020[DOI] http://hdl.handle.net/2078.1/156101[Handle]

Bibliographic reference	Bestgen, Yves. Inadequacy of the chi-squared test to examine vocabulary differences between corpora. In: Literary and Linguistic Computing : the journal of digital scholarship in the humanities, Vol. 29, no. 2, p. 164-170 (2014)
Permanent URL	http://hdl.handle.net/2078.1/156101

User menu

Inadequacy of the chi-squared test to examine vocabulary differences between corpora

Primary tabs

Footer Help

Languages

Footer menu

User menu

Search form

You are here

Inadequacy of the chi-squared test to examine vocabulary differences between corpora

Primary tabs

Footer Help

Languages

Footer menu