van Daal, Tine
[UA, Belgium]
Lesterhuis, Marije
[UA, Belgium]
Coertjens, Liesje
[UCL]
Donche, Vincent
[UA, Belgium]
De Maeyer, Sven
[UA, Belgium]
Recently, comparative judgement has been introduced as an alternative method for scoring essays. Although this method is promising in terms of obtaining reliable scores, empirical evidence concerning its validity is lacking. The current study examines implications resulting from two critical assumptions underpinning the use of comparative judgement, namely: its holistic characteristic and how the final rank order reflects the shared consensus on what makes for a good essay. Judges’ justifications that underpin their decisions are qualitatively analysed to obtain insight into the dimensions of academic writing they take into account. The results show that most arguments are directly related to the competence description. However, judges also use their expertise in order to judge the quality of essays. Additionally, judges differ in terms of how they conceptualise writing quality, and regarding the extent to which they tap into their own expertise. Finally, this study explores diverging conceptualisation of misfitting judges.
- Azarfam A. Y., Advances in Asian Social Science, 1, 139 (2012)
- Bloxham Sue, Marking and moderation in the UK: false assumptions and wasted resources, 10.1080/02602930801955978
- Bradley Ralph Allan, Terry Milton E., Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons, 10.2307/2334029
- Bramley T., Techniques for monitoring the comparability of examination standards, 246 (2007)
- Bramley T., Investigating the reliability of Adaptive Comparative Judgment (2015)
- Braun Virginia, Clarke Victoria, Using thematic analysis in psychology, 10.1191/1478088706qp063oa
- Cohen J., Statistical power analysis for the behavioral sciences (1988)
- Crisp Victoria, Criteria, comparison and past experiences: how do teachers make judgements when marking coursework?, 10.1080/0969594x.2012.741059
- Cumming Alister, Kantor Robert, Powers Donald E., Decision Making while Rating ESL/EFL Writing Tasks: A Descriptive Framework, 10.1111/1540-4781.00137
- Eckes Thomas, Rater types in writing performance assessments: A classification approach to rater variability, 10.1177/0265532207086780
- Field A., Discovering statistics using R (2012)
- Gebril Atta, Plakans Lia, Assembling validity evidence for assessing academic writing: Rater reactions to integrated tasks, 10.1016/j.asw.2014.03.002
- Heldsinger Sandra, Humphry Stephen, Using the method of pairwise comparison to obtain reliable teacher assessments, 10.1007/bf03216919
- Heldsinger Sandra A., Humphry Stephen M., Using calibrated exemplars in the teacher-assessment of writing: an empirical study, 10.1080/00131881.2013.825159
- Humphry Stephen M., McGrane Joshua A., Equating a large-scale writing assessment using pairwise comparisons of performances, 10.1007/s13384-014-0168-6
- Huot B. A., Validating holistic scoring for writing assessment: Theoretical and empirical foundations, 206 (1993)
- Jones Ian, Alcock Lara, Peer assessment without assessment criteria, 10.1080/03075079.2013.821974
- Jones I., Proceedings of the 37th conference of the International Group for the Psychology of Mathematics Education, 1, 113 (2013)
- Jones Ian, Swan Malcolm, Pollitt Alastair, ASSESSING MATHEMATICAL PROBLEM SOLVING USING COMPARATIVE JUDGEMENT, 10.1007/s10763-013-9497-6
- Kane Michael T., Validation as a Pragmatic, Scientific Activity : Validation as a Pragmatic, Scientific Activity, 10.1111/jedm.12007
- Laming D., Human judgment: The eye of the beholder (2003)
- Luce R. Duncan, On the possible psychophysical laws., 10.1037/h0043178
- Lumley Tom, Assessment criteria in a large-scale writing test: what do they really mean to the raters?, 10.1191/0265532202lt230oa
- Messick S., Meaning and Values in Test Validation: The Science and Ethics of Assessment, 10.3102/0013189x018002005
- Moss Pamela A., Validity in high stakes writing assessment: Problems and possibilities, 10.1016/1075-2935(94)90007-8
- Pollitt Alastair, Comparative judgement for assessment, 10.1007/s10798-011-9189-x
- Pollitt Alastair, The method of Adaptive Comparative Judgement, 10.1080/0969594x.2012.665354
- Pula J. J., Validating holistic scoring for writing assessment: Theoretical and empirical foundations, 237 (1993)
- Sadler D. Royce, Indeterminacy in the use of preset criteria for assessment and grading, 10.1080/02602930801956059
- Sakyi A. A., Fairness and validation in language assessment: Selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida, 9, 129 (2000)
- Stemler S., Practical Assessment, Research & Evaluation, 7, 137 (2001)
- Thurstone L. L., A law of comparative judgment., 10.1037/h0070288
- Thurstone L. L., Attitudes Can Be Measured, 10.1086/214483
- Turner Heather, Firth David, Bradley-Terry Models inR: TheBradleyTerry2Package, 10.18637/jss.v048.i09
- Volker Martin A., Reporting effect size estimates in school psychology research, 10.1002/pits.20176
- Whitehouse C., Testing the validity of judgements about geography essays using the Adaptive Comparative Judgement method (2012)
- Whitehouse C., Using adaptive comparative judgement to obtain a highly reliable rank order in summative assessment (2012)
Bibliographic reference |
van Daal, Tine ; Lesterhuis, Marije ; Coertjens, Liesje ; Donche, Vincent ; De Maeyer, Sven. Validity of comparative judgement to assess academic writing: examining implications of its holistic character and building on a shared consensus. In: Assessment in Education: Principles, Policy and Practice, Vol. 26, no. 1, p. 59-74 (2019) |
Permanent URL |
http://hdl.handle.net/2078.1/184263 |