ERIC - Search Results

Publication Date

In 2025	1
Since 2024	5
Since 2021 (last 5 years)	19
Since 2016 (last 10 years)	30
Since 2006 (last 20 years)	43

Descriptor

Comparative Analysis	48
Correlation	48
Evaluators	48
Second Language Learning	19
Foreign Countries	18
English (Second Language)	17
Scoring	13
Interrater Reliability	12
Writing Evaluation	12
Second Language Instruction	10
Computer Software	9
Reliability	9
Scores	9
Essays	8
Speech Communication	8
Statistical Analysis	8
Teaching Methods	8
Accuracy	7
Computer Assisted Testing	7
Evaluation Methods	7
Language Tests	7
Artificial Intelligence	6
Audio Equipment	6
Computational Linguistics	6
Grading	6
More ▼

Publication Type

Journal Articles	38
Reports - Research	37
Reports - Evaluative	9
Speeches/Meeting Papers	3
Tests/Questionnaires	3
Dissertations/Theses -…	2
Information Analyses	2

Education Level

Higher Education	11
Postsecondary Education	11
Early Childhood Education	2
Elementary Secondary Education	2
Primary Education	2
Secondary Education	2
Elementary Education	1
Grade 1	1
Grade 11	1
Grade 3	1
Kindergarten	1
Middle Schools	1
Preschool Education	1
Two Year Colleges	1
More ▼

Audience

Location

China	5
Argentina	1
Canada	1
Europe	1
Florida	1
Germany	1
Hong Kong	1
India	1
Iran	1
Japan	1
Mexico	1
Netherlands	1
Singapore	1
Texas	1
Turkey	1
United Kingdom	1
United States	1
More ▼

Laws, Policies, & Programs

Higher Education Act 1965

Assessments and Surveys

International English…	2
Rosenberg Self Esteem Scale	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 48 results Save | Export

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement

Peer reviewed

Direct link

Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024

Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…

Descriptors: Semantics, Educational Assessment, Evaluators, Reliability

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

Rater Connections and the Detection of Bias in Performance Assessment

Peer reviewed

Direct link

Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022

In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…

Descriptors: Evaluators, Bias, Identification, Performance Based Assessment

The Concurrent Validity of Comparative Judgement Outcomes Compared with Marks

Download full text

Gill, Tim – Research Matters, 2022

In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…

Descriptors: Comparative Analysis, Decision Making, Scripts, Standards

Cultural Competence: 10-Year Comparison of Program Evaluators' Perceptions

Peer reviewed

Direct link

Dunaway, Krystall; Gardner, Kristine; Grieve, Karly – American Journal of Evaluation, 2023

As part of its "Guiding Principles for Evaluators," the American Evaluation Association (AEA) requires that evaluators develop cultural competencies. Using a successive-independent-samples design, the researchers sought to compare perceptions of cultural competence across a duration of 10 years. Qualitative data were collected via online…

Descriptors: Cultural Awareness, Program Evaluation, Evaluators, Preferences

Method-of-Moment Corrected Maximum Likelihood (Ml) Structural-after-Measurement (SAM) Estimator for n-Level Structural Equation Models

Peer reviewed

Direct link

Fangxing Bai; Ben Kelcey – Society for Research on Educational Effectiveness, 2024

Purpose and Background: Despite the flexibility of multilevel structural equation modeling (MLSEM), a practical limitation many researchers encounter is how to effectively estimate model parameters with typical sample sizes when there are many levels of (potentially disparate) nesting. We develop a method-of-moment corrected maximum likelihood…

Descriptors: Maximum Likelihood Statistics, Structural Equation Models, Sample Size, Faculty Development

The Intersection of AI and Language Assessment: A Study on the Reliability of ChatGPT in Grading IELTS Writing Task 2

Peer reviewed
PDF on ERIC

Download full text

Osama Koraishi – Language Teaching Research Quarterly, 2024

This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…

Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Automated Assessment of Second Language Comprehensibility: Review, Training, Validation, and Generalization Studies

Peer reviewed

Direct link

Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023

Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…

Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication

Assessing Second-Language Academic Writing: AI vs. Human Raters

Peer reviewed
PDF on ERIC

Download full text

Vasfiye Geçkin; Ebru Kiziltas; Çagatay Çinar – Journal of Educational Technology and Online Learning, 2023

The quality of writing in a second language (L2) is one of the indicators of the level of proficiency for many college students to be eligible for departmental studies. Although certain software programs, such as Intelligent Essay Assessor or IntelliMetric, have been introduced to evaluate second-language writing quality, an overall assessment of…

Descriptors: Writing Evaluation, Second Language Learning, Second Language Instruction, Language Proficiency

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

The Dual Personality of 'Topic' in the IELTS Speaking Test

Peer reviewed

Direct link

Seedhouse, Paul – ELT Journal, 2019

This article investigates the central role of topic in the IELTS Speaking Test (IST). Topic has developed a dual personality in this interactional setting: topic-as-script is the scripted statement of topic on the examiner's cards prior to the interaction, whereas topic-as-action is how topic is developed by the candidate during the course of the…

Descriptors: English (Second Language), Language Tests, Second Language Learning, Personality Traits

Assumptions of Speaker Ethnicity and the Effect on Ratings of Accentedness, Comprehensibility, and Intelligibility

Peer reviewed

Direct link

Lee, Bradford J.; Bailey, Justin L. – Language Awareness, 2023

While listeners tend to downgrade speakers' accent and comprehensibility when they perceive them to be from a different language community--a process known as reverse linguistic stereotyping (RLS)--research has generally relied solely on quantitative data such as Likert scale ratings. The current study sought to extend the analysis further by…

Descriptors: Likert Scales, Stereotypes, Ethnicity, Intelligibility

Variation in Speech Intelligibility Ratings as a Function of Speech Rate Modification in Parkinson's Disease

Peer reviewed

Direct link

Knowles, Thea; Adams, Scott G.; Jog, Mandar – Journal of Speech, Language, and Hearing Research, 2021

Purpose: The aim of this study was to quantify changes in speech intelligibility in two cohorts of people with Parkinson's disease (PD; those with and without deep brain stimulation [DBS]) across a broad range of self-selected speech rate alterations in: (1) read sentences; and (2) extemporaneous speech (monologues). Method: Four speaker groups…

Descriptors: Diseases, Speech Communication, Speech Impairments, Intelligibility

Validation of an Automated Procedure for Calculating Core Lexicon from Transcripts

Peer reviewed

Direct link

Dalton, Sarah Grace; Stark, Brielle C.; Fromm, Davida; Apple, Kristen; MacWhinney, Brian; Rensch, Amanda; Rowedder, Madyson – Journal of Speech, Language, and Hearing Research, 2022

Purpose: The aim of this study was to advance the use of structured, monologic discourse analysis by validating an automated scoring procedure for core lexicon (CoreLex) using transcripts. Method: Forty-nine transcripts from persons with aphasia and 48 transcripts from persons with no brain injury were retrieved from the AphasiaBank database. Five…

Descriptors: Validity, Discourse Analysis, Databases, Scoring

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Studies in Second Language…	3
Applied Measurement in…	2
ETS Research Report Series	2
Journal of Speech, Language,…	2
Online Submission	2
ProQuest LLC	2
Advances in Language and…	1
Advances in Physiology…	1
American Journal of Evaluation	1
CALICO Journal	1
Canadian Modern Language…	1
Clinical Linguistics &…	1
Computer Assisted Language…	1
Creativity Research Journal	1
ELT Journal	1
Educational Research and…	1
English Teaching	1
European Early Childhood…	1
Eurydice	1
Grantee Submission	1
Hispania	1
ITHAKA S+R	1
International Educational…	1
Journal of Baltic Science…	1
Journal of Clinical Child and…	1
More ▼

Coniam, David	2
Linn, Robert L.	2
McDonough, Kim	2
Trofimovich, Pavel	2
Abashidze, Dato	1
Abdul Gafoor, K.	1
Adams, Scott G.	1
Aggarwal, Varun	1
Allan S. Cohen	1
Amanda Huee-Ping Wong	1
Apple, Kristen	1
Attali, Yigal	1
Baidak, Nathalie	1
Bailey, Justin L.	1
Barnes, Marcia A.	1
Ben Kelcey	1
Berry, Katherine A.	1
Biro, Frank M.	1
Brannen, Kathleen	1
Breyer, F. Jay	1
Cardoso, Walcir	1
Cauble, Mary	1
Chen, Sunny	1
Childress, Cameron	1
More ▼