ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	11
Since 2006 (last 20 years)	18

Descriptor

Comparative Analysis	25
Evaluation Criteria	25
Evaluators	25
Second Language Learning	12
English (Second Language)	10
Interrater Reliability	8
Writing Evaluation	7
Essays	6
Foreign Countries	6
Correlation	5
Evaluation Methods	5
Higher Education	5
Language Tests	5
Scores	5
Scoring	5
Scoring Rubrics	4
Second Language Instruction	4
Computer Software	3
Decision Making	3
Graduate Students	3
Language Proficiency	3
Native Speakers	3
Program Evaluation	3
Pronunciation	3
Rating Scales	3
More ▼

Source

Language Assessment Quarterly	3
British Journal of…	1
ETS Research Report Series	1
Evaluation Review	1
Hispania	1
International Journal of…	1
International Journal of…	1
Journal of Educational…	1
Journal of Research on…	1
Journal of Speech, Language,…	1
Language Testing	1
Language Testing in Asia	1
Perspectives in Education	1
ProQuest LLC	1
Reading & Writing Quarterly	1
TESL Canada Journal	1
Working Papers in TESOL &…	1
More ▼

Publication Type

Reports - Research	21
Journal Articles	18
Speeches/Meeting Papers	5
Information Analyses	2
Reports - Evaluative	2
Tests/Questionnaires	2
Dissertations/Theses -…	1

Education Level

Higher Education	6
Postsecondary Education	5
Adult Education	1
Grade 12	1
High Schools	1
Secondary Education	1

Audience

Researchers

Location

Argentina	1
China	1
Mexico	1
Pakistan	1
South Africa	1
Spain	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	2
International English…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 25 results Save | Export

Utilizing Large Language Models for EFL Essay Grading: An Examination of Reliability and Validity in Rubric-Based Assessments

Peer reviewed

Direct link

Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025

This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics

Depth-Perception-Based Representation in Holistic Rating on ESL Essay Writing

Peer reviewed

Direct link

Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024

This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…

Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy

Comparison of Inter-Rater Reliability Techniques in Performance-Based Assessment

Peer reviewed
PDF on ERIC

Download full text

Arslan Mancar, Sinem; Gulleroglu, H. Deniz – International Journal of Assessment Tools in Education, 2022

The aim of this study is to analyse the importance of the number of raters and compare the results obtained by techniques based on Classical Test Theory (CTT) and Generalizability (G) Theory. The Kappa and Krippendorff alpha techniques based on CTT were used to determine the inter-rater reliability. In this descriptive research data consists of…

Descriptors: Comparative Analysis, Interrater Reliability, Advanced Placement, Scoring Rubrics

Assessing Second-Language Academic Writing: AI vs. Human Raters

Peer reviewed
PDF on ERIC

Download full text

Vasfiye Geçkin; Ebru Kiziltas; Çagatay Çinar – Journal of Educational Technology and Online Learning, 2023

The quality of writing in a second language (L2) is one of the indicators of the level of proficiency for many college students to be eligible for departmental studies. Although certain software programs, such as Intelligent Essay Assessor or IntelliMetric, have been introduced to evaluate second-language writing quality, an overall assessment of…

Descriptors: Writing Evaluation, Second Language Learning, Second Language Instruction, Language Proficiency

Comparing Rating Modes: Analysing Live, Audio, and Video Ratings of IELTS Speaking Test Performances

Peer reviewed

Direct link

Nakatsuhara, Fumiyo; Inoue, Chihiro; Taylor, Lynda – Language Assessment Quarterly, 2021

This mixed methods study compared IELTS examiners' scores when assessing spoken performances under live and two 'non-live' testing conditions using audio and video recordings. Six IELTS examiners assessed 36 test-takers' performances under the live, audio, and video rating conditions. Scores in the three rating modes were calibrated using the…

Descriptors: Video Technology, Audio Equipment, English (Second Language), Language Tests

Assessing Transversal Competences in Professional Internships: The Role of Assessment Agents

Peer reviewed
PDF on ERIC

Download full text

Romeo, Marina; Yepes-Baldó, Montserrat; González, Vicenta; Burset, Silvia; Martín, Carolina; Bosch, Emma – International Journal of Instruction, 2022

The assessment process in higher education considers four aspects: assessment agents, procedure, content, and scoring. In this study, we delve into the who. We analyze the role of transversal competence assessment agents in the framework of professional internships in university master's degree programs, comparing the suitability of their…

Descriptors: Internship Programs, Higher Education, Evaluators, Masters Programs

Doctorates by Thesis and Publication in Clinical Medicine: An Analysis of Examiners' Reports

Peer reviewed

Direct link

Ramlall, Suvira; Singaram, V. S.; Sommerville, T. E. – Perspectives in Education, 2019

National and institutional policies to escalate the production of doctorates have raised concerns about the quality of PhDs in South Africa. This study evaluates examiner reports of doctorates by thesis and publication in clinical medicine to ascertain the criteria that examiners used to define a successful doctoral thesis. A qualitative…

Descriptors: Doctoral Dissertations, Educational Policy, Medical Research, Foreign Countries

The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

Direct link

Yun, Jiyeo – ProQuest LLC, 2017

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…

Descriptors: Interrater Reliability, Essays, Scoring, Evaluators

Rater Cognition in L2 Speaking Assessment: A Review of the Literature

Peer reviewed
PDF on ERIC

Download full text

Han, Qie – Working Papers in TESOL & Applied Linguistics, 2016

This literature review attempts to survey representative studies within the context of L2 speaking assessment that have contributed to the conceptualization of rater cognition. Two types of studies are looked at: 1) studies that examine "how" raters differ (and sometimes agree) in their cognitive processes and rating behaviors, in terms…

Descriptors: Second Language Learning, Student Evaluation, Evaluators, Speech Tests

Story Goodness in Adolescents with Autism Spectrum Disorder (ASD) and in Optimal Outcomes from ASD

Peer reviewed

Direct link

Canfield, Allison R.; Eigsti, Inge-Marie; de Marchena, Ashley; Fein, Deborah – Journal of Speech, Language, and Hearing Research, 2016

Purpose: This study examined narrative quality of adolescents with autism spectrum disorder (ASD) using a well-studied "story goodness" coding system. Method: Narrative samples were analyzed for distinct aspects of story goodness and rated by naïve readers on dimensions of story goodness, accuracy, cohesiveness, and oddness. Adolescents…

Descriptors: Autism, Pervasive Developmental Disorders, Adolescents, Comparative Analysis

Markers' Criteria in Assessing English Essays: An Exploratory Study of the Higher Secondary School Certificate (HSCC) in the Punjab Province of Pakistan

Peer reviewed

Direct link

Fernandez, Miguel; Siddiqui, Athar Munir – Language Testing in Asia, 2017

Background: Marking of essays is mainly carried out by human raters who bring in their own subjective and idiosyncratic evaluation criteria, which sometimes lead to discrepancy. This discrepancy may in turn raise issues like reliability and fairness. The current research attempts to explore the evaluation criteria of markers on a national level…

Descriptors: Grading, Evaluators, Evaluation Criteria, High Stakes Tests

Correspondence between Gonadal Steroid Hormone Concentrations and Secondary Sexual Characteristics Assessed by Clinicians, Adolescents, and Parents

Peer reviewed

Direct link

Huang, Bin; Hillman, Jennifer; Biro, Frank M.; Ding, Lili; Dorn, Lorah D.; Susman, Elizabeth J. – Journal of Research on Adolescence, 2012

Adolescent sexual maturation is staged using Tanner criteria assessed by clinicians, parents, or adolescents. The physiology of sexual maturation is driven by gonadal hormones. We investigate Tanner stage progression as a function of increasing gonadal hormone concentration and compare performances of different raters. Fifty-six boys (mean age,…

Descriptors: Adolescents, Physiology, Puberty, Biochemistry

Automated Trait Scores for "TOEFL"® Writing Tasks. Research Report. ETS RR-15-14

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal; Sinharay, Sandip – ETS Research Report Series, 2015

The "e-rater"® automated essay scoring system is used operationally in the scoring of "TOEFL iBT"® independent and integrated tasks. In this study we explored the psychometric added value of reporting four trait scores for each of these two tasks, beyond the total e-rater score.The four trait scores are word choice, grammatical…

Descriptors: Writing Tests, Scores, Language Tests, English (Second Language)

Investigating Differences between American and Indian Raters in Assessing TOEFL iBT Speaking Tasks

Peer reviewed

Direct link

Wei, Jing; Llosa, Lorena – Language Assessment Quarterly, 2015

This article reports on an investigation of the role raters' language background plays in raters' assessment of test takers' speaking ability. Specifically, this article examines differences between American and Indian raters in their scores and scoring processes when rating Indian test takers' responses to the Test of English as a Foreign…

Descriptors: North Americans, Indians, Evaluators, English (Second Language)

Which Features of Spanish Learners' Pronunciation Most Impact Listener Evaluations?

Peer reviewed

Direct link

McBride, Kara – Hispania, 2015

This study explores which features of Spanish as a foreign language (SFL) pronunciation most impact raters' evaluations. Native Spanish speakers (NSSs) from regions with different pronunciation norms were polled: 147 evaluators from northern Mexico and 99 evaluators from central Argentina. These evaluations were contrasted with ratings from…

Descriptors: Spanish, Pronunciation, Second Language Learning, Native Speakers

Previous Page | Next Page »

Pages: 1 | 2

Arslan Mancar, Sinem	1
Attali, Yigal	1
Barkaoui, Khaled	1
Biro, Frank M.	1
Bosch, Emma	1
Briggs, Sarah L.	1
Brown, Robert D.	1
Burset, Silvia	1
Canfield, Allison R.	1
Ding, Lili	1
Dorn, Lorah D.	1
Ebru Kiziltas	1
Eigsti, Inge-Marie	1
Fatih Yavuz	1
Fein, Deborah	1
Fernandez, Miguel	1
Gamze Yavas Çelik	1
González, Vicenta	1
Gulleroglu, H. Deniz	1
Guntly, Erin	1
Han, Qie	1
Hattendorf, Lynn C.	1
Hillman, Jennifer	1
Huang, Bin	1
Inoue, Chihiro	1
More ▼