ERIC - Search Results

Publication Date

In 2025	7
Since 2024	21

Descriptor

Comparative Analysis	21
Evaluators	21
Computer Software	14
Artificial Intelligence	13
Writing Evaluation	9
Accuracy	8
Essays	8
Computational Linguistics	7
English (Second Language)	7
Foreign Countries	7
Second Language Learning	7
Reliability	6
Scoring	6
Second Language Instruction	6
Correlation	5
Scores	5
Scoring Rubrics	5
Computer Assisted Testing	4
Language Tests	4
College Students	3
Evaluation Methods	3
Grading	3
Interrater Reliability	3
Item Analysis	3
Language Proficiency	3
More ▼

Source

Journal of Educational and…	2
Research Synthesis Methods	2
Teaching of Psychology	2
Advances in Health Sciences…	1
Advances in Physiology…	1
British Journal of…	1
IEEE Transactions on Learning…	1
Innovation in Language…	1
Innovations in Education and…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
Journal of Baltic Science…	1
Language Learning and…	1
Language Teaching Research…	1
ProQuest LLC	1
Reading & Writing Quarterly	1
Society for Research on…	1
More ▼

Publication Type

Journal Articles	19
Reports - Research	19
Tests/Questionnaires	2
Dissertations/Theses -…	1
Reports - Evaluative	1

Education Level

Higher Education	8
Postsecondary Education	8
Early Childhood Education	2
Elementary Education	2
Primary Education	2
Secondary Education	2
Elementary Secondary Education	1
Grade 1	1
Grade 2	1
Kindergarten	1

Audience

Location

China	2
Iran	1
Japan	1
Singapore	1
Texas	1
Thailand	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

International English…	2
Test of English for…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 21 results Save | Export

Implicit versus Explicit First Impressions in Performance-Based Assessment: Will Raters Overcome Their First Impressions When Learner Performance Changes?

Peer reviewed

Direct link

Timothy J. Wood; Vijay J. Daniels; Debra Pugh; Claire Touchie; Samantha Halman; Susan Humphrey-Murto – Advances in Health Sciences Education, 2024

First impressions can influence rater-based judgments but their contribution to rater bias is unclear. Research suggests raters can overcome first impressions in experimental exam contexts with explicit first impressions, but these findings may not generalize to a workplace context with implicit first impressions. The study had two aims. First, to…

Descriptors: Evaluators, Work Environment, Decision Making, Video Technology

Different in Different Ways: A Network-Analysis Approach to Voice and Prosody in Autism Spectrum Disorder

Peer reviewed

Direct link

Ethan Weed; Riccardo Fusaroli; Elizabeth Simmons; Inge-Marie Eigsti – Language Learning and Development, 2024

The current study investigated whether the difficulty in finding group differences in prosody between speakers with autism spectrum disorder (ASD) and neurotypical (NT) speakers might be explained by identifying different acoustic profiles of speakers which, while still perceived as atypical, might be characterized by different acoustic qualities.…

Descriptors: Network Analysis, Autism Spectrum Disorders, Intonation, Suprasegmentals

Towards the Automatic Risk of Bias Assessment on Randomized Controlled Trials: A Comparison of RobotReviewer and Humans

Peer reviewed

Direct link

Yuan Tian; Xi Yang; Suhail A. Doi; Luis Furuya-Kanamori; Lifeng Lin; Joey S. W. Kwong; Chang Xu – Research Synthesis Methods, 2024

RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two…

Descriptors: Risk, Randomized Controlled Trials, Classification, Robotics

Can Large Language Models Replace Humans in Systematic Reviews? Evaluating GPT-4's Efficacy in Screening and Extracting Data from Peer-Reviewed and Grey Literature in Multiple Languages

Peer reviewed

Direct link

Qusai Khraisha; Sophie Put; Johanna Kappenberg; Azza Warraitch; Kristin Hadfield – Research Synthesis Methods, 2024

Systematic reviews are vital for guiding practice, research and policy, although they are often slow and labour-intensive. Large language models (LLMs) could speed up and automate systematic reviews, but their performance in such tasks has yet to be comprehensively evaluated against humans, and no study has tested Generative Pre-Trained…

Descriptors: Peer Evaluation, Research Reports, Artificial Intelligence, Computer Software

Grading the Graders: Comparing Generative AI and Human Assessment in Essay Evaluation

Peer reviewed

Direct link

Elizabeth L. Wetzler; Kenneth S. Cassidy; Margaret J. Jones; Chelsea R. Frazier; Nickalous A. Korbut; Chelsea M. Sims; Shari S. Bowen; Michael Wood – Teaching of Psychology, 2025

Background: Generative artificial intelligence (AI) represents a potentially powerful, time-saving tool for grading student essays. However, little is known about how AI-generated essay scores compare to human instructor scores. Objective: The purpose of this study was to compare the essay grading scores produced by AI with those of human…

Descriptors: Essays, Writing Evaluation, Scores, Evaluators

The Relationship between Kenexa Scores and the Texas Teacher Evaluation System

Direct link

Christopher D. Daniel – ProQuest LLC, 2024

Districts spend thousands of dollars on computerized teacher screeners without knowing if they are identifying the most effective teacher. Hiring quality staff is one of the most important job functions of a principal, and many times a teacher screener score may eliminate an effective teacher. The current study examined the value of teacher…

Descriptors: Teacher Evaluation, Scores, Screening Tests, Teacher Effectiveness

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement

Peer reviewed

Direct link

Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024

Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…

Descriptors: Semantics, Educational Assessment, Evaluators, Reliability

Examining the Effect of Item Difficulty and Rater Leniency on Iranian Test Takers' Performance on WDCT and DSAT: A Comparative Study

Peer reviewed
PDF on ERIC

Download full text

Reza Shahi; Hamdollah Ravand; Golam Reza Rohani – International Journal of Language Testing, 2025

The current paper intends to exploit the Many Facet Rasch Model to investigate and compare the impact of situations (items) and raters on test takers' performance on the Written Discourse Completion Test (WDCT) and Discourse Self-Assessment Tests (DSAT). In this study, the participants were 110 English as a Foreign Language (EFL) students at…

Descriptors: Comparative Analysis, English (Second Language), Second Language Learning, Second Language Instruction

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

Examining the Effect of Assessment Construct Characteristics on Machine Learning Scoring of Scientific Argumentation

Peer reviewed

Direct link

Kevin C. Haudek; Xiaoming Zhai – International Journal of Artificial Intelligence in Education, 2024

Argumentation, a key scientific practice presented in the "Framework for K-12 Science Education," requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging…

Descriptors: Accuracy, Persuasive Discourse, Artificial Intelligence, Learning Management Systems

Content and Item Response Theory Analysis of ChatGPT-4-Generated Multiple-Choice Items

Peer reviewed

Direct link

Roger Young; Emily Courtney; Alexander Kah; Mariah Wilkerson; Yi-Hsin Chen – Teaching of Psychology, 2025

Background: Multiple-choice item (MCI) assessments are burdensome for instructors to develop. Artificial intelligence (AI, e.g., ChatGPT) can streamline the process without sacrificing quality. The quality of AI-generated MCIs and human experts is comparable. However, whether the quality of AI-generated MCIs is equally good across various domain-…

Descriptors: Item Response Theory, Multiple Choice Tests, Psychology, Textbooks

Utilizing Large Language Models for EFL Essay Grading: An Examination of Reliability and Validity in Rubric-Based Assessments

Peer reviewed

Direct link

Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025

This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics

Automated Essay Scoring and Revising Based on Open-Source Large Language Models

Peer reviewed

Direct link

Yishen Song; Qianta Zhu; Huaibo Wang; Qinhua Zheng – IEEE Transactions on Learning Technologies, 2024

Manually scoring and revising student essays has long been a time-consuming task for educators. With the rise of natural language processing techniques, automated essay scoring (AES) and automated essay revising (AER) have emerged to alleviate this burden. However, current AES and AER models require large amounts of training data and lack…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

Combining Human and Automated Scoring Methods in Experimental Assessments of Writing: A Case Study Tutorial

Peer reviewed

Direct link

Reagan Mozer; Luke Miratrix; Jackie Eunjung Relyea; James S. Kim – Journal of Educational and Behavioral Statistics, 2024

In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This…

Descriptors: Scoring, Evaluation Methods, Writing Evaluation, Comparative Analysis

Method-of-Moment Corrected Maximum Likelihood (Ml) Structural-after-Measurement (SAM) Estimator for n-Level Structural Equation Models

Peer reviewed

Direct link

Fangxing Bai; Ben Kelcey – Society for Research on Educational Effectiveness, 2024

Purpose and Background: Despite the flexibility of multilevel structural equation modeling (MLSEM), a practical limitation many researchers encounter is how to effectively estimate model parameters with typical sample sizes when there are many levels of (potentially disparate) nesting. We develop a method-of-moment corrected maximum likelihood…

Descriptors: Maximum Likelihood Statistics, Structural Equation Models, Sample Size, Faculty Development

Previous Page | Next Page »

Pages: 1 | 2

Ahmet Can Uyar	1
Alexander Kah	1
Allan S. Cohen	1
Amanda Huee-Ping Wong	1
Azza Warraitch	1
Ben Kelcey	1
Chang Xu	1
Chelsea M. Sims	1
Chelsea R. Frazier	1
Christopher D. Daniel	1
Claire Touchie	1
Debra Pugh	1
Dilek Büyükahiska	1
Elizabeth L. Wetzler	1
Elizabeth Simmons	1
Emily Courtney	1
Ethan Weed	1
Fangxing Bai	1
Fatih Yavuz	1
Gamze Yavas Çelik	1
Golam Reza Rohani	1
Guangtian Zhu	1
Hamdollah Ravand	1
Huaibo Wang	1
Inge-Marie Eigsti	1
More ▼