ERIC - Search Results

Publication Date

In 2025

Descriptor

Comparative Analysis	11
Foreign Countries	6
Reliability	5
Test Reliability	5
English (Second Language)	4
Scores	4
Test Items	4
Accuracy	3
Artificial Intelligence	3
Computational Linguistics	3
Correlation	3
Evaluators	3
Item Analysis	3
Language Tests	3
Second Language Instruction	3
Second Language Learning	3
Writing Evaluation	3
Computer Software	2
Elementary School Students	2
Essays	2
Evaluation Methods	2
Item Response Theory	2
Language Teachers	2
Models	2
Scoring	2
More ▼

Source

Language Testing	2
Anatomical Sciences Education	1
British Journal of…	1
Educational Process:…	1
International Journal of…	1
International Journal of…	1
Journal of Baltic Science…	1
Journal of Educational…	1
Language Assessment Quarterly	1
Language Learning	1

Publication Type

Journal Articles	11
Reports - Research	10
Information Analyses	1
Tests/Questionnaires	1

Education Level

Higher Education	4
Postsecondary Education	4
Elementary Education	2
Secondary Education	2

Audience

Location

Iran	2
Austria	1
China	1
Jordan	1
Vietnam	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 11 results Save | Export

Comparing and Combining IRTree Models and Anchoring Vignettes in Addressing Response Styles

Peer reviewed

Direct link

Mingfeng Xue; Ping Chen – Journal of Educational Measurement, 2025

Response styles pose great threats to psychological measurements. This research compares IRTree models and anchoring vignettes in addressing response styles and estimating the target traits. It also explores the potential of combining them at the item level and total-score level (ratios of extreme and middle responses to vignettes). Four models…

Descriptors: Item Response Theory, Models, Comparative Analysis, Vignettes

A Comparison of Yen's Q3 Coefficient and Rasch Testlet Modeling for Identifying Local Item Dependence: Evidence from Two Vocabulary Matching Tests

Peer reviewed

Direct link

Hung Tan Ha; Duyen Thi Bich Nguyen; Tim Stoeckel – Language Assessment Quarterly, 2025

This article compares two methods for detecting local item dependence (LID): residual correlation examination and Rasch testlet modeling (RTM), in a commonly used 3:6 matching format and an extended matching test (EMT) format. The two formats are hypothesized to facilitate different levels of item dependency due to differences in the number of…

Descriptors: Comparative Analysis, Language Tests, Test Items, Item Analysis

A Methodological Review of Listening Comprehension Tests for Primary School Children

Peer reviewed

Direct link

Kiri Mealings; Kelly Miles; Joerg M. Buchholz – International Journal of Listening, 2025

A child's ability to comprehend speech in the mainstream classroom is vital for intellectual and social development. However, listening conditions are often sub-optimal; the presence of multiple talkers, high noise levels, and long reverberation times add to the challenge of listening with a developing auditory system. An assessment that captures…

Descriptors: Elementary School Students, Listening Comprehension Tests, Comparative Analysis, Speech Communication

Comparative Analysis of LLMs Performance in Medical Embryology: A Cross-Platform Study of ChatGPT, Claude, Gemini, and Copilot

Peer reviewed

Direct link

Olena Bolgova; Paul Ganguly; Volodymyr Mavrych – Anatomical Sciences Education, 2025

Integrating artificial intelligence, particularly large language models (LLMs), into medical education represents a significant new step in how medical knowledge is accessed, processed, and evaluated. The objective of this study was to conduct a comprehensive analysis comparing the performance of advanced LLM chatbots in different topics of…

Descriptors: Comparative Analysis, Artificial Intelligence, Technology Uses in Education, Natural Language Processing

Assessing the Impact of Predictive Thinking-Based Learning Activities on Enhancing Creative Writing in Language Learning Classrooms

Peer reviewed
PDF on ERIC

Download full text

Ali Al-Barakat; Rommel AlAli; Omayya Al-Hassan; Khaled Al-Saud – Educational Process: International Journal, 2025

Background/purpose: The study tries to discover how predictive thinking can be incorporated into writing activities to assist students in developing their creative skills in writing learning environments. Through this study, teachers will be able to adopt a new teaching method that helps transform the way creative writing is taught in language…

Descriptors: Thinking Skills, Creative Writing, Writing Instruction, Validity

Simulating the Relationship between Nonword Repetition Performance and Vocabulary Growth in 2-Year-Olds: Evidence from the Language 0-5 Project

Peer reviewed

Direct link

Caroline F. Rowland; Amy Bidgood; Gary Jones; Andrew Jessop; Paula Stinson; Julian M. Pine; Samantha Durrant; Michelle S. Peter – Language Learning, 2025

A strong predictor of children's language is performance on non-word repetition (NWR) tasks. However, the basis of this relationship remains unknown. Some suggest that NWR tasks measure phonological working memory, which then affects language growth. Others argue that children's knowledge of language/language experience affects NWR performance. A…

Descriptors: Vocabulary Development, Comparative Analysis, Computational Linguistics, Language Skills

Examining the Effect of Item Difficulty and Rater Leniency on Iranian Test Takers' Performance on WDCT and DSAT: A Comparative Study

Peer reviewed
PDF on ERIC

Download full text

Reza Shahi; Hamdollah Ravand; Golam Reza Rohani – International Journal of Language Testing, 2025

The current paper intends to exploit the Many Facet Rasch Model to investigate and compare the impact of situations (items) and raters on test takers' performance on the Written Discourse Completion Test (WDCT) and Discourse Self-Assessment Tests (DSAT). In this study, the participants were 110 English as a Foreign Language (EFL) students at…

Descriptors: Comparative Analysis, English (Second Language), Second Language Learning, Second Language Instruction

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

Utilizing Large Language Models for EFL Essay Grading: An Examination of Reliability and Validity in Rubric-Based Assessments

Peer reviewed

Direct link

Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025

This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics

Comparative Judgement for Evaluating Young Learners' EFL Writing Performances: Reliability and Teacher Perceptions of Holistic and Dimension-Based Judgements

Peer reviewed

Direct link

Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025

Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

A New Scoring Method for Item Response Theory Analysis of C-Tests

Peer reviewed

Direct link

Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025

This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…

Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction

Ali Al-Barakat	1
Amy Bidgood	1
Andrew Jessop	1
Caroline F. Rowland	1
Duyen Thi Bich Nguyen	1
Esmat Babaii	1
Farshad Effatpanah	1
Fatih Yavuz	1
Gamze Yavas Çelik	1
Gary Jones	1
Golam Reza Rohani	1
Guangtian Zhu	1
Hamdollah Ravand	1
Hung Tan Ha	1
Jianwen Xiong	1
Joerg M. Buchholz	1
John Pill	1
Julian M. Pine	1
Kelly Miles	1
Khaled Al-Saud	1
Kiri Mealings	1
Lin Liu	1
Michelle S. Peter	1
Mingfeng Xue	1
Mona Tabatabaee-Yazdi	1
More ▼