Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 11 |
Since 2006 (last 20 years) | 18 |
Descriptor
Comparative Analysis | 25 |
Evaluation Criteria | 25 |
Evaluators | 25 |
Second Language Learning | 12 |
English (Second Language) | 10 |
Interrater Reliability | 8 |
Writing Evaluation | 7 |
Essays | 6 |
Foreign Countries | 6 |
Correlation | 5 |
Evaluation Methods | 5 |
More ▼ |
Source
Author
Arslan Mancar, Sinem | 1 |
Attali, Yigal | 1 |
Barkaoui, Khaled | 1 |
Biro, Frank M. | 1 |
Bosch, Emma | 1 |
Briggs, Sarah L. | 1 |
Brown, Robert D. | 1 |
Burset, Silvia | 1 |
Canfield, Allison R. | 1 |
Ding, Lili | 1 |
Dorn, Lorah D. | 1 |
More ▼ |
Publication Type
Reports - Research | 21 |
Journal Articles | 18 |
Speeches/Meeting Papers | 5 |
Information Analyses | 2 |
Reports - Evaluative | 2 |
Tests/Questionnaires | 2 |
Dissertations/Theses -… | 1 |
Education Level
Higher Education | 6 |
Postsecondary Education | 5 |
Adult Education | 1 |
Grade 12 | 1 |
High Schools | 1 |
Secondary Education | 1 |
Audience
Researchers | 3 |
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 2 |
International English… | 1 |
What Works Clearinghouse Rating
Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025
This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics
Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024
This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…
Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy
Arslan Mancar, Sinem; Gulleroglu, H. Deniz – International Journal of Assessment Tools in Education, 2022
The aim of this study is to analyse the importance of the number of raters and compare the results obtained by techniques based on Classical Test Theory (CTT) and Generalizability (G) Theory. The Kappa and Krippendorff alpha techniques based on CTT were used to determine the inter-rater reliability. In this descriptive research data consists of…
Descriptors: Comparative Analysis, Interrater Reliability, Advanced Placement, Scoring Rubrics
Vasfiye Geçkin; Ebru Kiziltas; Çagatay Çinar – Journal of Educational Technology and Online Learning, 2023
The quality of writing in a second language (L2) is one of the indicators of the level of proficiency for many college students to be eligible for departmental studies. Although certain software programs, such as Intelligent Essay Assessor or IntelliMetric, have been introduced to evaluate second-language writing quality, an overall assessment of…
Descriptors: Writing Evaluation, Second Language Learning, Second Language Instruction, Language Proficiency
Comparing Rating Modes: Analysing Live, Audio, and Video Ratings of IELTS Speaking Test Performances
Nakatsuhara, Fumiyo; Inoue, Chihiro; Taylor, Lynda – Language Assessment Quarterly, 2021
This mixed methods study compared IELTS examiners' scores when assessing spoken performances under live and two 'non-live' testing conditions using audio and video recordings. Six IELTS examiners assessed 36 test-takers' performances under the live, audio, and video rating conditions. Scores in the three rating modes were calibrated using the…
Descriptors: Video Technology, Audio Equipment, English (Second Language), Language Tests
Romeo, Marina; Yepes-Baldó, Montserrat; González, Vicenta; Burset, Silvia; Martín, Carolina; Bosch, Emma – International Journal of Instruction, 2022
The assessment process in higher education considers four aspects: assessment agents, procedure, content, and scoring. In this study, we delve into the who. We analyze the role of transversal competence assessment agents in the framework of professional internships in university master's degree programs, comparing the suitability of their…
Descriptors: Internship Programs, Higher Education, Evaluators, Masters Programs
Ramlall, Suvira; Singaram, V. S.; Sommerville, T. E. – Perspectives in Education, 2019
National and institutional policies to escalate the production of doctorates have raised concerns about the quality of PhDs in South Africa. This study evaluates examiner reports of doctorates by thesis and publication in clinical medicine to ascertain the criteria that examiners used to define a successful doctoral thesis. A qualitative…
Descriptors: Doctoral Dissertations, Educational Policy, Medical Research, Foreign Countries
Yun, Jiyeo – ProQuest LLC, 2017
Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…
Descriptors: Interrater Reliability, Essays, Scoring, Evaluators
Han, Qie – Working Papers in TESOL & Applied Linguistics, 2016
This literature review attempts to survey representative studies within the context of L2 speaking assessment that have contributed to the conceptualization of rater cognition. Two types of studies are looked at: 1) studies that examine "how" raters differ (and sometimes agree) in their cognitive processes and rating behaviors, in terms…
Descriptors: Second Language Learning, Student Evaluation, Evaluators, Speech Tests
Canfield, Allison R.; Eigsti, Inge-Marie; de Marchena, Ashley; Fein, Deborah – Journal of Speech, Language, and Hearing Research, 2016
Purpose: This study examined narrative quality of adolescents with autism spectrum disorder (ASD) using a well-studied "story goodness" coding system. Method: Narrative samples were analyzed for distinct aspects of story goodness and rated by naïve readers on dimensions of story goodness, accuracy, cohesiveness, and oddness. Adolescents…
Descriptors: Autism, Pervasive Developmental Disorders, Adolescents, Comparative Analysis
Fernandez, Miguel; Siddiqui, Athar Munir – Language Testing in Asia, 2017
Background: Marking of essays is mainly carried out by human raters who bring in their own subjective and idiosyncratic evaluation criteria, which sometimes lead to discrepancy. This discrepancy may in turn raise issues like reliability and fairness. The current research attempts to explore the evaluation criteria of markers on a national level…
Descriptors: Grading, Evaluators, Evaluation Criteria, High Stakes Tests
Huang, Bin; Hillman, Jennifer; Biro, Frank M.; Ding, Lili; Dorn, Lorah D.; Susman, Elizabeth J. – Journal of Research on Adolescence, 2012
Adolescent sexual maturation is staged using Tanner criteria assessed by clinicians, parents, or adolescents. The physiology of sexual maturation is driven by gonadal hormones. We investigate Tanner stage progression as a function of increasing gonadal hormone concentration and compare performances of different raters. Fifty-six boys (mean age,…
Descriptors: Adolescents, Physiology, Puberty, Biochemistry
Attali, Yigal; Sinharay, Sandip – ETS Research Report Series, 2015
The "e-rater"® automated essay scoring system is used operationally in the scoring of "TOEFL iBT"® independent and integrated tasks. In this study we explored the psychometric added value of reporting four trait scores for each of these two tasks, beyond the total e-rater score.The four trait scores are word choice, grammatical…
Descriptors: Writing Tests, Scores, Language Tests, English (Second Language)
Wei, Jing; Llosa, Lorena – Language Assessment Quarterly, 2015
This article reports on an investigation of the role raters' language background plays in raters' assessment of test takers' speaking ability. Specifically, this article examines differences between American and Indian raters in their scores and scoring processes when rating Indian test takers' responses to the Test of English as a Foreign…
Descriptors: North Americans, Indians, Evaluators, English (Second Language)
McBride, Kara – Hispania, 2015
This study explores which features of Spanish as a foreign language (SFL) pronunciation most impact raters' evaluations. Native Spanish speakers (NSSs) from regions with different pronunciation norms were polled: 147 evaluators from northern Mexico and 99 evaluators from central Argentina. These evaluations were contrasted with ratings from…
Descriptors: Spanish, Pronunciation, Second Language Learning, Native Speakers
Previous Page | Next Page »
Pages: 1 | 2