ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	4
Since 2017 (last 10 years)	7
Since 2007 (last 20 years)	16

Descriptor

Computer Assisted Testing	18
Correlation	18
Interrater Reliability	18
Scoring	12
English (Second Language)	9
Foreign Countries	8
Second Language Learning	8
Evaluators	7
Comparative Analysis	6
Computer Software	6
Essays	6
Language Tests	6
Scores	6
Test Reliability	6
Test Validity	6
Educational Technology	4
Evaluation Methods	4
Writing Evaluation	4
Accuracy	3
At Risk Students	3
Automation	3
Diagnostic Tests	3
Error of Measurement	3
Essay Tests	3
Grading	3
More ▼

Source

ETS Research Report Series	2
Advances in Physiology…	1
American College Testing…	1
Applied Measurement in…	1
Computers & Education	1
Educational Research and…	1
Educational Testing Service	1
English Teaching	1
Grantee Submission	1
IEEE Transactions on Learning…	1
International Association for…	1
International Journal of…	1
Journal of Educational…	1
Journal of Mental Health…	1
Language Testing	1
Language, Speech, and Hearing…	1
ReCALL	1
More ▼

Publication Type

Journal Articles	15
Reports - Research	12
Reports - Descriptive	2
Reports - Evaluative	2
Tests/Questionnaires	2
Collected Works - Proceedings	1
Information Analyses	1

Education Level

Higher Education	4
Postsecondary Education	3
Secondary Education	3
Elementary Secondary Education	2
Grade 11	1

Audience

Location

Hong Kong	2
Israel	2
Singapore	2
Asia	1
Australia	1
Brazil	1
China	1
Connecticut	1
Denmark	1
Egypt	1
Estonia	1
Florida	1
Germany	1
Greece	1
Hawaii	1
Ireland	1
Italy	1
Japan	1
Kazakhstan	1
Netherlands	1
Norway	1
Ohio	1
Pakistan	1
Pennsylvania	1
Philippines	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	3
Strengths and Difficulties…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 18 results Save | Export

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

Online Administration of the Test of Narrative Language--Second Edition: Psychometrics and Considerations for Remote Assessment

Peer reviewed
PDF on ERIC

Download full text

Direct link

Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Grantee Submission, 2022

Purpose: Our aim was to evaluate the psychometric properties of the online administered format of the Test of Narrative Language--Second Edition (TNL-2; Gillam & Pearson, 2017), given the importance of assessing children's narrative ability and considerable absence of psychometric studies of spoken language assessments administered online.…

Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments

The Influence of Rater Effects in Training Sets on the Psychometric Quality of Automated Scoring for Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018

Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…

Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring

Online Administration of the Test of Narrative Language--Second Edition: Psychometrics and Considerations for Remote Assessment

Peer reviewed
PDF on ERIC

Download full text

Direct link

Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Language, Speech, and Hearing Services in Schools, 2022

Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments

Validating Human and Automated Scoring of Essays against "True" Scores

Peer reviewed

Direct link

Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018

In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…

Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing

The Influence of Training and Experience on Rater Performance in Scoring Spoken Language

Peer reviewed

Direct link

Davis, Larry – Language Testing, 2016

Two factors were investigated that are thought to contribute to consistency in rater scoring judgments: rater training and experience in scoring. Also considered were the relative effects of scoring rubrics and exemplars on rater performance. Experienced teachers of English (N = 20) scored recorded responses from the TOEFL iBT speaking test prior…

Descriptors: Evaluators, Oral Language, Scores, Language Tests

Automated Essay Feedback Generation and Its Impact on Revision

Peer reviewed

Direct link

Liu, Ming; Li, Yi; Xu, Weiwei; Liu, Li – IEEE Transactions on Learning Technologies, 2017

Writing an essay is a very important skill for students to master, but a difficult task for them to overcome. It is particularly true for English as Second Language (ESL) students in China. It would be very useful if students could receive timely and effective feedback about their writing. Automatic essay feedback generation is a challenging task,…

Descriptors: Foreign Countries, College Students, Second Language Learning, English (Second Language)

Subjective Mental Health, Peer Relations, Family, and School Environment in Adolescents with Intellectual Developmental Disorder: A First Report of a New Questionnaire Administered on Tablet PCs

Peer reviewed

Direct link

Boström, Petra; Johnels, Jakob Åsberg; Thorson, Maria; Broberg, Malin – Journal of Mental Health Research in Intellectual Disabilities, 2016

Few studies have explored the subjective mental health of adolescents with intellectual disabilities, while proxy ratings indicate an overrepresentation of mental health problems. The present study reports on the design and an initial empirical evaluation of the Well-being in Special Education Questionnaire (WellSEQ). Questions, response scales,…

Descriptors: Mental Health, Peer Relationship, Family Environment, Educational Environment

Investigating the Suitability of Implementing the "e-rater"® Scoring Engine in a Large-Scale English Language Testing Program. Research Report. ETS RR-13-36

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013

In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…

Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests

Use of e-rater[R] in Scoring of the TOEFL iBT[R] Writing Test. Research Report. ETS RR-11-25

Download full text

Haberman, Shelby J. – Educational Testing Service, 2011

Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…

Descriptors: Writing Tests, Scoring, Essays, Language Tests

Marking Student Programs Using Graph Similarity

Peer reviewed

Direct link

Naude, Kevin A.; Greyling, Jean H.; Vogts, Dieter – Computers & Education, 2010

We present a novel approach to the automated marking of student programming assignments. Our technique quantifies the structural similarity between unmarked student submissions and marked solutions, and is the basis by which we assign marks. This is accomplished through an efficient novel graph similarity measure ("AssignSim"). Our experiments…

Descriptors: Grading, Assignments, Correlation, Interrater Reliability

A Comparison of Onscreen and Paper-Based Marking in the Hong Kong Public Examination System

Peer reviewed

Direct link

Coniam, David – Educational Research and Evaluation, 2009

This paper describes a study comparing paper-based marking (PBM) and onscreen marking (OSM) in Hong Kong utilising English language essay scripts drawn from the live 2007 Hong Kong Certificate of Education Examination (HKCEE) Year 11 English Language Writing Paper. In the study, 30 raters from the 2007 HKCEE Writing Paper marked on paper 100…

Descriptors: Student Attitudes, Foreign Countries, Essays, Comparative Analysis

Experimenting with a Computer Essay-Scoring Program Based on ESL Student Writing Scripts

Peer reviewed

Direct link

Coniam, David – ReCALL, 2009

This paper describes a study of the computer essay-scoring program BETSY. While the use of computers in rating written scripts has been criticised in some quarters for lacking transparency or lack of fit with how human raters rate written scripts, a number of essay rating programs are available commercially, many of which claim to offer comparable…

Descriptors: Writing Tests, Scoring, Foreign Countries, Interrater Reliability

Inventory of Work-Relevant Values: 2001 Revision. ACT Research Report Series, 2004-03

Download full text

Bobek, Becky L.; Gore, Paul A. – American College Testing (ACT), Inc., 2004

This research report describes changes made to the Inventory of Work-Relevant Values when it was revised for online use as a part of the Internet version of DISCOVER. Users will see the following differences between the online and CD-ROM versions of the inventory: 22 items rather than 61, simplified presentation, and the contribution of all items…

Descriptors: Interrater Reliability, Field Tests, Internet, Test Construction

Previous Page | Next Page »

Pages: 1 | 2

Anna-Maria Fall	2
Beula M. Magimairaj	2
Coniam, David	2
Greg Roberts	2
Philip Capin	2
Ronald B. Gillam	2
Sandra L. Gillam	2
Sharon Vaughn	2
Amanda Huee-Ping Wong	1
Ben-Simon, Anat	1
Bobek, Becky L.	1
Boström, Petra	1
Breyer, F. Jay	1
Broberg, Malin	1
Clariana, Roy B.	1
Cohen, Yoav	1
Davis, Larry	1
Engelhard, George, Jr.	1
Foltz, Peter	1
Gore, Paul A.	1
Greyling, Jean H.	1
Haberman, Shelby J.	1
Ivan Cherh Chiet Low	1
Jiyeo Yun	1
Johnels, Jakob Åsberg	1
More ▼