ERIC - Search Results

Publication Date

In 2026	0
Since 2025	3
Since 2022 (last 5 years)	19
Since 2017 (last 10 years)	40
Since 2007 (last 20 years)	63

Descriptor

Accuracy	64
Comparative Analysis	64
Test Items	64
Item Response Theory	26
Foreign Countries	19
Item Analysis	15
Difficulty Level	14
Classification	13
Computer Assisted Testing	13
Correlation	13
Computer Software	12
Multiple Choice Tests	12
Computation	11
English (Second Language)	11
Language Tests	11
Models	11
Monte Carlo Methods	11
Second Language Learning	11
Diagnostic Tests	10
Sample Size	10
Statistical Analysis	10
Test Length	10
Scores	9
Scoring	9
Test Construction	9
More ▼

Publication Type

Reports - Research	54
Journal Articles	47
Dissertations/Theses -…	9
Speeches/Meeting Papers	5
Numerical/Quantitative Data	2
Tests/Questionnaires	2
Reports - Evaluative	1

Education Level

Higher Education	15
Postsecondary Education	14
Secondary Education	7
Elementary Education	6
Early Childhood Education	3
Grade 3	3
Primary Education	3
Elementary Secondary Education	2
High Schools	2
Junior High Schools	2
Middle Schools	2
Grade 8	1
More ▼

Audience

Location

Japan	3
Russia	2
Austria	1
Belgium	1
Chile	1
China (Shanghai)	1
Germany	1
Iran	1
Ireland (Dublin)	1
Norway	1
Taiwan	1
Turkey	1
Turkey (Ankara)	1
United Kingdom (England)	1
United Kingdom (Wales)	1
Yemen	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	2
Wide Range Achievement Test	2
Digit Span Test	1
Measures of Academic Progress	1
Program for International…	1
Progress in International…	1
SAT (College Admission Test)	1
Test of English for…	1
Trends in International…	1
Woodcock Reading Mastery Test	1

What Works Clearinghouse Rating

Showing 1 to 15 of 64 results Save | Export

Comparing and Combining IRTree Models and Anchoring Vignettes in Addressing Response Styles

Peer reviewed

Direct link

Mingfeng Xue; Ping Chen – Journal of Educational Measurement, 2025

Response styles pose great threats to psychological measurements. This research compares IRTree models and anchoring vignettes in addressing response styles and estimating the target traits. It also explores the potential of combining them at the item level and total-score level (ratios of extreme and middle responses to vignettes). Four models…

Descriptors: Item Response Theory, Models, Comparative Analysis, Vignettes

Cognitive Diagnosis Testlet Model for Multiple-Choice Items

Peer reviewed

Direct link

Lei Guo; Wenjie Zhou; Xiao Li – Journal of Educational and Behavioral Statistics, 2024

The testlet design is very popular in educational and psychological assessments. This article proposes a new cognitive diagnosis model, the multiple-choice cognitive diagnostic testlet (MC-CDT) model for tests using testlets consisting of MC items. The MC-CDT model uses the original examinees' responses to MC items instead of dichotomously scored…

Descriptors: Multiple Choice Tests, Diagnostic Tests, Accuracy, Computer Software

IRT Characteristic Curve Linking Methods Weighted by Information for Mixed-Format Tests

Peer reviewed

Direct link

Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024

To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…

Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement

Examining the Effect of Item Difficulty and Rater Leniency on Iranian Test Takers' Performance on WDCT and DSAT: A Comparative Study

Peer reviewed
PDF on ERIC

Download full text

Reza Shahi; Hamdollah Ravand; Golam Reza Rohani – International Journal of Language Testing, 2025

The current paper intends to exploit the Many Facet Rasch Model to investigate and compare the impact of situations (items) and raters on test takers' performance on the Written Discourse Completion Test (WDCT) and Discourse Self-Assessment Tests (DSAT). In this study, the participants were 110 English as a Foreign Language (EFL) students at…

Descriptors: Comparative Analysis, English (Second Language), Second Language Learning, Second Language Instruction

An Evaluation of Fit Indices Used in Model Selection of Dichotomous Mixture IRT Models

Peer reviewed

Direct link

Sedat Sen; Allan S. Cohen – Educational and Psychological Measurement, 2024

A Monte Carlo simulation study was conducted to compare fit indices used for detecting the correct latent class in three dichotomous mixture item response theory (IRT) models. Ten indices were considered: Akaike's information criterion (AIC), the corrected AIC (AICc), Bayesian information criterion (BIC), consistent AIC (CAIC), Draper's…

Descriptors: Goodness of Fit, Item Response Theory, Sample Size, Classification

How Useful Is Comparative Judgement of Item Difficulty for Standard Maintaining?

Download full text

Benton, Tom – Research Matters, 2020

This article reviews the evidence on the extent to which experts' perceptions of item difficulties, captured using comparative judgement, can predict empirical item difficulties. This evidence is drawn from existing published studies on this topic and also from statistical analysis of data held by Cambridge Assessment. Having reviewed the…

Descriptors: Test Items, Difficulty Level, Expertise, Comparative Analysis

Modeling and Analyzing Scorer Preferences in Short-Answer Math Questions

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mengxue; Heffernan, Neil; Lan, Andrew – International Educational Data Mining Society, 2023

Automated scoring of student responses to open-ended questions, including short-answer questions, has great potential to scale to a large number of responses. Recent approaches for automated scoring rely on supervised learning, i.e., training classifiers or fine-tuning language models on a small number of responses with human-provided score…

Descriptors: Scoring, Computer Assisted Testing, Mathematics Instruction, Mathematics Tests

Diagnostic Classification Model for Forced-Choice Items and Noncognitive Tests

Peer reviewed

Direct link

Huang, Hung-Yu – Educational and Psychological Measurement, 2023

The forced-choice (FC) item formats used for noncognitive tests typically develop a set of response options that measure different traits and instruct respondents to make judgments among these options in terms of their preference to control the response biases that are commonly observed in normative tests. Diagnostic classification models (DCMs)…

Descriptors: Test Items, Classification, Bayesian Statistics, Decision Making

Two IRT Characteristic Curve Linking Methods Weighted by Information

Peer reviewed

Direct link

Wang, Shaojie; Zhang, Minqiang; Lee, Won-Chan; Huang, Feifei; Li, Zonglong; Li, Yixing; Yu, Sufang – Journal of Educational Measurement, 2022

Traditional IRT characteristic curve linking methods ignore parameter estimation errors, which may undermine the accuracy of estimated linking constants. Two new linking methods are proposed that take into account parameter estimation errors. The item- (IWCC) and test-information-weighted characteristic curve (TWCC) methods employ weighting…

Descriptors: Item Response Theory, Error of Measurement, Accuracy, Monte Carlo Methods

Reliability and Validity Evidence of Diagnostic Methods: Comparison of Diagnostic Classification Models and Item Response Theory-Based Methods

Direct link

Yoo Jeong Jang – ProQuest LLC, 2022

Despite the increasing demand for diagnostic information, observed subscores have been often reported to lack adequate psychometric qualities such as reliability, distinctiveness, and validity. Therefore, several statistical techniques based on CTT and IRT frameworks have been proposed to improve the quality of subscores. More recently, DCM has…

Descriptors: Classification, Accuracy, Item Response Theory, Correlation

ChatGPT-4o, ChatGPT-4 and Google Gemini are Compared with Students: A Study in Higher Education

Peer reviewed
PDF on ERIC

Download full text

Harun Bayer; Fazilet Gül Ince Araci; Gülsah Gürkan – International Journal of Technology in Education and Science, 2024

The rapid advancement of artificial intelligence technologies, their pervasive use in every field, and the growing understanding of the benefits they bring have led actors in the education sector to pursue research in this field. In particular, the use of artificial intelligence tools has become more prevalent in the education sector due to the…

Descriptors: Artificial Intelligence, Computer Software, Computational Linguistics, Technology Uses in Education

Does Comparative Judgement of Scripts Provide an Effective Means of Maintaining Standards in Mathematics? Research Report

Download full text

Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020

In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…

Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level

Changes in the Speed-Ability Relation through Different Treatments of Rapid Guessing

Peer reviewed

Direct link

Deribo, Tobias; Goldhammer, Frank; Kroehne, Ulf – Educational and Psychological Measurement, 2023

As researchers in the social sciences, we are often interested in studying not directly observable constructs through assessments and questionnaires. But even in a well-designed and well-implemented study, rapid-guessing behavior may occur. Under rapid-guessing behavior, a task is skimmed shortly but not read and engaged with in-depth. Hence, a…

Descriptors: Reaction Time, Guessing (Tests), Behavior Patterns, Bias

A Simulation Study to Compare Nonequivalent Groups with Anchor Test Equating and Pseudo-Equivalent Group Linking. Research Report. ETS RR-18-08

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Guo, Hongwen – ETS Research Report Series, 2018

In this paper we compare the newly developed pseudo-equivalent groups (PEG) linking method with the linking methods based on the traditional nonequivalent groups with anchor test (NEAT) design and illustrate how to use the PEG methods under imperfect equating conditions. To do this, we proposed a new method that combines the features of PEG…

Descriptors: Equated Scores, Comparative Analysis, Test Items, Background

Can Valuable Information Be Prioritized in Verbal Working Memory?

Peer reviewed

Direct link

Atkinson, Amy L.; Allen, Richard J.; Baddeley, Alan D.; Hitch, Graham J.; Waterman, Amanda H. – Journal of Experimental Psychology: Learning, Memory, and Cognition, 2021

Though there is substantial evidence that individuals can prioritize more valuable information in visual working memory (WM), little research has examined this in the verbal domain. Four experiments were conducted to investigate this and the conditions under which effects emerge. In each experiment, participants listened to digit sequences and…

Descriptors: Verbal Communication, Short Term Memory, Task Analysis, Recall (Psychology)

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Educational and Psychological…	9
ProQuest LLC	9
ETS Research Report Series	7
Applied Measurement in…	5
Language Testing	4
Journal of Educational…	3
Cambridge Assessment	2
Grantee Submission	2
International Educational…	2
Journal of Experimental…	2
Assessment for Effective…	1
Eurasian Journal of…	1
Interactive Learning…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
JALT CALL Journal	1
Journal of Educational and…	1
Journal of Language and…	1
Language Teaching Research	1
Metacognition and Learning	1
Online Submission	1
Pearson	1
More ▼

von Davier, Matthias	3
Benton, Tom	2
Dodd, Barbara G.	2
He, Wei	2
Livingston, Samuel A.	2
Nelson, Gena	2
Allan S. Cohen	1
Allen, Richard J.	1
Anderssen, Merete	1
Anil, Duygu	1
Armani Talwar	1
Ashwell, Tim	1
Atkinson, Amy L.	1
Awadh, Awadh Nasser Munassar	1
Ayaka Sugawara	1
Baddeley, Alan D.	1
Bilan Liang	1
Bowden, Harriet Wood	1
Bradshaw, Laine	1
Bramley, Tom	1
Carey, Jill	1
Chang, Hua-Hua	1
Chen, Haiwen H.	1
Chien, Yuehmei	1
Cho, Sun-Joo	1
More ▼