ERIC - Search Results

Publication Date

In 2025	1
Since 2024	4
Since 2021 (last 5 years)	9
Since 2016 (last 10 years)	19
Since 2006 (last 20 years)	20

Descriptor

International Assessment	20
Achievement Tests	18
Foreign Countries	18
Secondary School Students	15
Item Response Theory	9
Test Items	9
Models	8
Simulation	6
Scores	5
Mathematics Tests	4
Monte Carlo Methods	4
Reaction Time	4
Reading Tests	4
Responses	4
Science Tests	4
Bayesian Statistics	3
Error of Measurement	3
Evaluation Methods	3
Gender Differences	3
Measurement	3
Science Achievement	3
Test Bias	3
Testing Problems	3
Accuracy	2
Comparative Analysis	2
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	20
Reports - Research	15
Reports - Descriptive	4
Reports - Evaluative	1

Education Level

Secondary Education	16
Elementary Secondary Education	2
Elementary Education	1
Grade 4	1
Intermediate Grades	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	16
Trends in International…	2
National Assessment of…	1
Program for the International…	1
Progress in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 20 results Save | Export

MSAEM Estimation for Confirmatory Multidimensional Four-Parameter Normal Ogive Models

Peer reviewed

Direct link

Jia Liu; Xiangbin Meng; Gongjun Xu; Wei Gao; Ningzhong Shi – Journal of Educational Measurement, 2024

In this paper, we develop a mixed stochastic approximation expectation-maximization (MSAEM) algorithm coupled with a Gibbs sampler to compute the marginalized maximum a posteriori estimate (MMAPE) of a confirmatory multidimensional four-parameter normal ogive (M4PNO) model. The proposed MSAEM algorithm not only has the computational advantages of…

Descriptors: Algorithms, Achievement Tests, Foreign Countries, International Assessment

Incorporating Test-Taking Engagement into Multistage Adaptive Testing Design for Large-Scale Assessments

Peer reviewed

Direct link

Okan Bulut; Guher Gorgun; Hacer Karamese – Journal of Educational Measurement, 2025

The use of multistage adaptive testing (MST) has gradually increased in large-scale testing programs as MST achieves a balanced compromise between linear test design and item-level adaptive testing. MST works on the premise that each examinee gives their best effort when attempting the items, and their responses truly reflect what they know or can…

Descriptors: Response Style (Tests), Testing Problems, Testing Accommodations, Measurement

DIF Detection for Multiple Groups: Comparing Three-Level GLMMs and Multiple-Group IRT Models

Peer reviewed

Direct link

Carmen Köhler; Lale Khorramdel; Artur Pokropek; Johannes Hartig – Journal of Educational Measurement, 2024

For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The…

Descriptors: Measures (Individuals), Test Bias, Models, Item Response Theory

Differences in Time Usage as a Competing Hypothesis for Observed Group Differences in Accuracy with an Application to Observed Gender Differences in PISA Data

Peer reviewed

Direct link

Radhika Kapoor; Erin Fahle; Klint Kanopka; David Klinowski; Ana Trindade Ribeiro; Benjamin W. Domingue – Journal of Educational Measurement, 2024

Group differences in test scores are a key metric in education policy. Response time offers novel opportunities for understanding these differences, especially in low-stakes settings. Here, we describe how observed group differences in test accuracy can be attributed to group differences in latent response speed or group differences in latent…

Descriptors: Foreign Countries, Secondary School Students, Achievement Tests, International Assessment

Random Responders in the TIMSS 2015 Student Questionnaire: A Threat to Validity?

Peer reviewed

Direct link

van Laar, Saskia; Braeken, Johan – Journal of Educational Measurement, 2022

The low-stakes character of international large-scale educational assessments implies that a participating student might at times provide unrelated answers as if s/he was not even reading the items and choosing a response option randomly throughout. Depending on the severity of this invalid response behavior, interpretations of the assessment…

Descriptors: Achievement Tests, Elementary Secondary Education, International Assessment, Foreign Countries

Linking via Pseudo-Equivalent Group Design: Methodological Considerations and an Application to the PISA and PIACC Assessments

Peer reviewed

Direct link

Pokropek, Artur; Borgonovi, Francesca – Journal of Educational Measurement, 2020

This article presents the pseudo-equivalent group approach and discusses how it can enhance the quality of linking in the presence of nonequivalent groups. The pseudo-equivalent group approach allows to achieve pseudo-equivalence using propensity score reweighting techniques. We use it to perform linking to establish scale concordance between two…

Descriptors: Foreign Countries, Secondary School Students, Achievement Tests, International Assessment

Bias and Bias Correction Method for Nonproportional Abilities Requirement (NPAR) Tests

Peer reviewed

Direct link

Ip, Edward H.; Strachan, Tyler; Fu, Yanyan; Lay, Alexandra; Willse, John T.; Chen, Shyh-Huei; Rutkowski, Leslie; Ackerman, Terry – Journal of Educational Measurement, 2019

Test items must often be broad in scope to be ecologically valid. It is therefore almost inevitable that secondary dimensions are introduced into a test during test development. A cognitive test may require one or more abilities besides the primary ability to correctly respond to an item, in which case a unidimensional test score overestimates the…

Descriptors: Test Items, Test Bias, Test Construction, Scores

Use of Adjustment by Minimum Discriminant Information in Linking Constructed-Response Test Scores in the Absence of Common Items

Peer reviewed

Direct link

Lee, Yi-Hsuan; Haberman, Shelby J.; Dorans, Neil J. – Journal of Educational Measurement, 2019

In many educational tests, both multiple-choice (MC) and constructed-response (CR) sections are used to measure different constructs. In many common cases, security concerns lead to the use of form-specific CR items that cannot be used for equating test scores, along with MC sections that can be linked to previous test forms via common items. In…

Descriptors: Scores, Multiple Choice Tests, Test Items, Responses

On Joining a Signal Detection Choice Model with Response Time Models

Peer reviewed

Direct link

DeCarlo, Lawrence T. – Journal of Educational Measurement, 2021

In a signal detection theory (SDT) approach to multiple choice exams, examinees are viewed as choosing, for each item, the alternative that is perceived as being the most plausible, with perceived plausibility depending in part on whether or not an item is known. The SDT model is a process model and provides measures of item difficulty, item…

Descriptors: Perception, Bias, Theories, Test Items

Multiple-Group Joint Modeling of Item Responses, Response Times, and Action Counts with the Conway-Maxwell-Poisson Distribution

Peer reviewed

Direct link

Qiao, Xin; Jiao, Hong; He, Qiwei – Journal of Educational Measurement, 2023

Multiple group modeling is one of the methods to address the measurement noninvariance issue. Traditional studies on multiple group modeling have mainly focused on item responses. In computer-based assessments, joint modeling of response times and action counts with item responses helps estimate the latent speed and action levels in addition to…

Descriptors: Multivariate Analysis, Models, Item Response Theory, Statistical Distributions

Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests

Peer reviewed

Direct link

Shear, Benjamin R. – Journal of Educational Measurement, 2023

Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…

Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests

Explanatory Cognitive Diagnostic Modeling Incorporating Response Times

Peer reviewed

Direct link

Qiao, Xin; Jiao, Hong – Journal of Educational Measurement, 2021

This study proposes explanatory cognitive diagnostic model (CDM) jointly incorporating responses and response times (RTs) with the inclusion of item covariates related to both item responses and RTs. The joint modeling of item responses and RTs intends to provide more information for cognitive diagnosis while item covariates can be used to predict…

Descriptors: Cognitive Measurement, Models, Reaction Time, Test Items

A More Flexible Bayesian Multilevel Bifactor Item Response Theory Model

Peer reviewed

Direct link

Fujimoto, Ken A. – Journal of Educational Measurement, 2020

Multilevel bifactor item response theory (IRT) models are commonly used to account for features of the data that are related to the sampling and measurement processes used to gather those data. These models conventionally make assumptions about the portions of the data structure that represent these features. Unfortunately, when data violate these…

Descriptors: Bayesian Statistics, Item Response Theory, Achievement Tests, Secondary School Students

A Response Time Process Model for Not-Reached and Omitted Items

Peer reviewed

Direct link

Lu, Jing; Wang, Chun – Journal of Educational Measurement, 2020

Item nonresponses are prevalent in standardized testing. They happen either when students fail to reach the end of a test due to a time limit or quitting, or when students choose to omit some items strategically. Oftentimes, item nonresponses are nonrandom, and hence, the missing data mechanism needs to be properly modeled. In this paper, we…

Descriptors: Item Response Theory, Test Items, Standardized Tests, Responses

Sensitivity of the RMSD for Detecting Item-Level Misfit in Low-Performing Countries

Peer reviewed

Direct link

Tijmstra, Jesper; Bolsinova, Maria; Liaw, Yuan-Ling; Rutkowski, Leslie; Rutkowski, David – Journal of Educational Measurement, 2020

Although the root-mean squared deviation (RMSD) is a popular statistical measure for evaluating country-specific item-level misfit (i.e., differential item functioning [DIF]) in international large-scale assessment, this paper shows that its sensitivity to detect misfit may depend strongly on the proficiency distribution of the considered…

Descriptors: Test Items, Goodness of Fit, Probability, Accuracy

Previous Page | Next Page »

Pages: 1 | 2

Rutkowski, Leslie	3
Jiao, Hong	2
Qiao, Xin	2
Ackerman, Terry	1
Ana Trindade Ribeiro	1
Artur Pokropek	1
Benjamin W. Domingue	1
Bolsinova, Maria	1
Borgonovi, Francesca	1
Braeken, Johan	1
Carmen Köhler	1
Chen, Shyh-Huei	1
David Klinowski	1
De Boeck, Paul	1
DeCarlo, Lawrence T.	1
Debeer, Dries	1
Dorans, Neil J.	1
Erin Fahle	1
Fu, Yanyan	1
Fujimoto, Ken A.	1
Gongjun Xu	1
Greiff, Samuel	1
Guher Gorgun	1
Haag, Nicole	1
Haberman, Shelby J.	1
More ▼