ERIC - Search Results

Publication Date

In 2026	0
Since 2025	3
Since 2022 (last 5 years)	12
Since 2017 (last 10 years)	30
Since 2007 (last 20 years)	60

Descriptor

Computer Assisted Testing	79
Interrater Reliability	79
Scoring	35
English (Second Language)	27
Second Language Learning	26
Foreign Countries	23
Computer Software	20
Writing Evaluation	20
Correlation	18
Evaluators	18
Language Tests	18
Essays	17
Evaluation Methods	16
Test Reliability	16
Comparative Analysis	15
Scores	15
Test Validity	14
Educational Technology	13
Writing Tests	12
Accuracy	11
College Students	11
Essay Tests	11
Grading	11
Automation	10
Scoring Rubrics	10
More ▼

Publication Type

Journal Articles	61
Reports - Research	54
Reports - Evaluative	15
Speeches/Meeting Papers	8
Tests/Questionnaires	6
Reports - Descriptive	5
Dissertations/Theses -…	2
Numerical/Quantitative Data	2
Books	1
Collected Works - General	1
Collected Works - Proceedings	1
Information Analyses	1
More ▼

Education Level

Higher Education	28
Postsecondary Education	23
Secondary Education	9
Elementary Secondary Education	8
High Schools	4
Elementary Education	3
Middle Schools	2
Grade 11	1
Grade 8	1
Preschool Education	1

Audience

Researchers	2
Practitioners	1
Teachers	1

Location

Turkey	5
China	4
Germany	3
South Korea	3
Australia	2
Hong Kong	2
Israel	2
Japan	2
Netherlands	2
Singapore	2
Taiwan	2
United Kingdom	2
Arizona	1
Asia	1
Brazil	1
China (Beijing)	1
Connecticut	1
Denmark	1
Egypt	1
Estonia	1
Florida	1
France	1
Greece	1
Hawaii	1
Ireland	1
More ▼

Laws, Policies, & Programs

Pell Grant Program

Assessments and Surveys

Test of English as a Foreign…	12
National Assessment of…	2
ACT Assessment	1
Expressive One Word Picture…	1
Graduate Management Admission…	1
Graduate Record Examinations	1
Mean Length of Utterance	1
Peabody Picture Vocabulary…	1
Program for International…	1
Strengths and Difficulties…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 79 results Save | Export

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

Best Practices for Constructed-Response Scoring. Research Report. ETS RR-22-17

Peer reviewed
PDF on ERIC

Download full text

McCaffrey, Daniel F.; Casabianca, Jodi M.; Ricker-Pedley, Kathryn L.; Lawless, René R.; Wendler, Cathy – ETS Research Report Series, 2022

This document describes a set of best practices for developing, implementing, and maintaining the critical process of scoring constructed-response tasks. These practices address both the use of human raters and automated scoring systems as part of the scoring process and cover the scoring of written, spoken, performance, or multimodal responses.…

Descriptors: Best Practices, Scoring, Test Format, Computer Assisted Testing

AI-Assisted Assessment of Inquiry Skills in Socioscientific Issue Contexts

Peer reviewed

Direct link

Wen Xin Zhang; John J. H. Lin; Ying-Shao Hsu – Journal of Computer Assisted Learning, 2025

Background Study: Assessing learners' inquiry-based skills is challenging as social, political, and technological dimensions must be considered. The advanced development of artificial intelligence (AI) makes it possible to address these challenges and shape the next generation of science education. Objectives: The present study evaluated the SSI…

Descriptors: Artificial Intelligence, Computer Assisted Testing, Inquiry, Active Learning

Evaluating Quadratic Weighted Kappa as the Standard Performance Metric for Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Doewes, Afrizal; Kurdhi, Nughthoh Arfawi; Saxena, Akrati – International Educational Data Mining Society, 2023

Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of…

Descriptors: Essays, Writing Evaluation, Evaluators, Accuracy

Examining AI-Based Accuracy Assessment in L2 Learners' Writing

Peer reviewed

Direct link

On-Soon Lee – Journal of Pan-Pacific Association of Applied Linguistics, 2024

Despite the increasing interest in using AI tools as assistant agents in instructional settings, the effectiveness of ChatGPT, the generative pretrained AI, for evaluating the accuracy of second language (L2) writing has been largely unexplored in formative assessment. Therefore, the current study aims to examine how ChatGPT, as an evaluator,…

Descriptors: Foreign Countries, Undergraduate Students, English (Second Language), Second Language Learning

Establishing a Physics Concept Inventory Using Computer Marked Free-Response Questions

Peer reviewed
PDF on ERIC

Download full text

Parker, Mark A. J.; Hedgeland, Holly; Jordan, Sally E.; Braithwaite, Nicholas St. J. – European Journal of Science and Mathematics Education, 2023

The study covers the development and testing of the alternative mechanics survey (AMS), a modified force concept inventory (FCI), which used automatically marked free-response questions. Data were collected over a period of three academic years from 611 participants who were taking physics classes at high school and university level. A total of…

Descriptors: Test Construction, Scientific Concepts, Physics, Test Reliability

Automated Essay Scoring Effect on Test Equating Errors in Mixed-Format Test

Peer reviewed
PDF on ERIC

Download full text

Uysal, Ibrahim; Dogan, Nuri – International Journal of Assessment Tools in Education, 2021

Scoring constructed-response items can be highly difficult, time-consuming, and costly in practice. Improvements in computer technology have enabled automated scoring of constructed-response items. However, the application of automated scoring without an investigation of test equating can lead to serious problems. The goal of this study was to…

Descriptors: Computer Assisted Testing, Scoring, Item Response Theory, Test Format

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Computer-Programmed Decision Trees for Assessing Teacher Noticing

Peer reviewed

Direct link

Schack, Edna O.; Dueber, David; Thomas, Jonathan Norris; Fisher, Molly H.; Jong, Cindy – AERA Online Paper Repository, 2019

Scoring of teachers' noticing responses is typically burdened with rater bias and reliance upon interrater consensus. The authors sought to make the scoring process more objective, equitable, and generalizable. The development process began with a description of response characteristics for each professional noticing component disconnected from…

Descriptors: Models, Teacher Evaluation, Observation, Bias

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

Online Administration of the Test of Narrative Language--Second Edition: Psychometrics and Considerations for Remote Assessment

Peer reviewed
PDF on ERIC

Download full text

Direct link

Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Grantee Submission, 2022

Purpose: Our aim was to evaluate the psychometric properties of the online administered format of the Test of Narrative Language--Second Edition (TNL-2; Gillam & Pearson, 2017), given the importance of assessing children's narrative ability and considerable absence of psychometric studies of spoken language assessments administered online.…

Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments

Developing and Validating a Computerized Oral Proficiency Test of English as a Foreign Language (COPTEFL)

Peer reviewed
PDF on ERIC

Download full text

Isler, Cemre; Aydin, Belgin – International Journal of Assessment Tools in Education, 2021

This study is about the development and validation process of the Computerized Oral Proficiency Test of English as a Foreign Language (COPTEFL). The test aims at assessing the speaking proficiency levels of students in Anadolu University School of Foreign Languages (AUSFL). For this purpose, three monologic tasks were developed based on the Global…

Descriptors: Test Construction, Construct Validity, Interrater Reliability, Scores

Computer Assisted Evaluation Using Rubrics for Reduction of Errors and Inter and Intra Examiner Heterogeneity

Peer reviewed

Direct link

Gauns Dessai, Kissan G.; Kamat, Venkatesh V. – International Journal of Information and Communication Technology Education, 2018

Educational institutions worldwide conduct summative examinations to evaluate academic performance of students. Such summative examinations are normally subjective in nature in higher education institutions and needs manual evaluation. However, the manual evaluation of subjective answer-scripts often suffers from evaluation anomalies and the…

Descriptors: Computer Assisted Testing, Student Evaluation, Scoring Rubrics, Error Patterns

The Influence of Rater Effects in Training Sets on the Psychometric Quality of Automated Scoring for Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018

Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…

Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

ETS Research Report Series	7
Journal of Technology,…	4
Assessing Writing	2
Computers & Education	2
International Journal of…	2
Journal of Computer Assisted…	2
Journal of Educational…	2
Language Assessment Quarterly	2
Language Testing	2
Online Submission	2
SAGE Open	2
AERA Online Paper Repository	1
ALT-J: Research in Learning…	1
Advances in Physiology…	1
American College Testing…	1
Applied Measurement in…	1
Australian Educational…	1
British Educational Research…	1
British Journal of…	1
Cognition and Instruction	1
E-Learning	1
Education and Information…	1
Educational Leadership	1
Educational Research and…	1
Educational Testing Service	1
More ▼

Anna-Maria Fall	2
Aydin, Selami	2
Ben-Simon, Anat	2
Bennett, Randy Elliot	2
Beula M. Magimairaj	2
Casabianca, Jodi M.	2
Clariana, Roy B.	2
Coniam, David	2
Greg Roberts	2
Philip Capin	2
Ronald B. Gillam	2
Sandra L. Gillam	2
Sharon Vaughn	2
Wolfe, Edward W.	2
Alexander, R. Curby	1
Allen, Nancy	1
Alt, Mary	1
Amanda Huee-Ping Wong	1
Aydin, Belgin	1
Bejar, Isaac I.	1
Bell, John F.	1
Bennett, Randy Elliott	1
Bhola, Dennison S.	1
Bobek, Becky L.	1
More ▼