ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	9
Since 2006 (last 20 years)	16

Descriptor

Item Response Theory	21
Test Validity	9
Validity	8
Models	7
Simulation	6
Scores	5
Construct Validity	4
Evaluation Methods	4
Test Bias	4
Test Items	4
Mathematical Models	3
Nonparametric Statistics	3
Psychometrics	3
Test Construction	3
Test Reliability	3
Academic Achievement	2
Achievement Tests	2
College Students	2
Comparative Analysis	2
Computation	2
Equations (Mathematics)	2
Error of Measurement	2
Evaluators	2
Evidence	2
Grade Point Average	2
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	21
Reports - Research	13
Reports - Evaluative	7
Speeches/Meeting Papers	2
Reports - Descriptive	1

Education Level

Elementary Secondary Education	1
Middle Schools	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Lexile Scale of Reading	1
Teaching and Learning…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 21 results Save | Export

A Bayesian Moderated Nonlinear Factor Analysis Approach for DIF Detection under Violation of the Equal Variance Assumption

Peer reviewed

Direct link

Sooyong Lee; Suhwa Han; Seung W. Choi – Journal of Educational Measurement, 2024

Research has shown that multiple-indicator multiple-cause (MIMIC) models can result in inflated Type I error rates in detecting differential item functioning (DIF) when the assumption of equal latent variance is violated. This study explains how the violation of the equal variance assumption adversely impacts the detection of nonuniform DIF and…

Descriptors: Factor Analysis, Bayesian Statistics, Test Bias, Item Response Theory

Using a Projection IRT Method for Vertical Scaling When Construct Shift Is Present

Peer reviewed

Direct link

Strachan, Tyler; Cho, Uk Hyun; Kim, Kyung Yong; Willse, John T.; Chen, Shyh-Huei; Ip, Edward H.; Ackerman, Terry A.; Weeks, Jonathan P. – Journal of Educational Measurement, 2021

In vertical scaling, results of tests from several different grade levels are placed on a common scale. Most vertical scaling methodologies rely heavily on the assumption that the construct being measured is unidimensional. In many testing situations, however, such an assumption could be problematic. For instance, the construct measured at one…

Descriptors: Item Response Theory, Scaling, Tests, Construct Validity

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

An Exponentially Weighted Moving Average Procedure for Detecting Back Random Responding Behavior

Peer reviewed

Direct link

He, Yinhong – Journal of Educational Measurement, 2023

Back random responding (BRR) behavior is one of the commonly observed careless response behaviors. Accurately detecting BRR behavior can improve test validities. Yu and Cheng (2019) showed that the change point analysis (CPA) procedure based on weighted residual (CPA-WR) performed well in detecting BRR. Compared with the CPA procedure, the…

Descriptors: Test Validity, Item Response Theory, Measurement, Monte Carlo Methods

Random Responders in the TIMSS 2015 Student Questionnaire: A Threat to Validity?

Peer reviewed

Direct link

van Laar, Saskia; Braeken, Johan – Journal of Educational Measurement, 2022

The low-stakes character of international large-scale educational assessments implies that a participating student might at times provide unrelated answers as if s/he was not even reading the items and choosing a response option randomly throughout. Depending on the severity of this invalid response behavior, interpretations of the assessment…

Descriptors: Achievement Tests, Elementary Secondary Education, International Assessment, Foreign Countries

Nonparametric Evidence of Validity, Reliability, and Fairness for Rater-Mediated Assessments: An Illustration Using Mokken Scale Analysis

Peer reviewed

Direct link

Wind, Stefanie A. – Journal of Educational Measurement, 2019

Numerous researchers have proposed methods for evaluating the quality of rater-mediated assessments using nonparametric methods (e.g., kappa coefficients) and parametric methods (e.g., the many-facet Rasch model). Generally speaking, popular nonparametric methods for evaluating rating quality are not based on a particular measurement theory. On…

Descriptors: Nonparametric Statistics, Test Validity, Test Reliability, Item Response Theory

Can We Learn from Student Mistakes in a Formative, Reading Comprehension Assessment?

Peer reviewed

Direct link

Liu, Bowen; Kennedy, Patrick C.; Seipel, Ben; Carlson, Sarah E.; Biancarosa, Gina; Davison, Mark L. – Journal of Educational Measurement, 2019

This article describes an ongoing project to develop a formative, inferential reading comprehension assessment of causal story comprehension. It has three features to enhance classroom use: equated scale scores for progress monitoring within and across grades, a scale score to distinguish among low-scoring students based on patterns of mistakes,…

Descriptors: Formative Evaluation, Reading Comprehension, Story Reading, Test Construction

Modeling Response Styles in Cross-Country Self-Reports: An Application of a Multilevel Multidimensional Nominal Response Model

Peer reviewed

Direct link

Ju, Unhee; Falk, Carl F. – Journal of Educational Measurement, 2019

We examined the feasibility and results of a multilevel multidimensional nominal response model (ML-MNRM) for measuring both substantive constructs and extreme response style (ERS) across countries. The ML-MNRM considers within-country clustering while allowing overall item slopes to vary across items and examination of whether certain items were…

Descriptors: Cross Cultural Studies, Self Efficacy, Item Response Theory, Item Analysis

Validating the Interpretations and Uses of Test Scores

Peer reviewed

Direct link

Kane, Michael T. – Journal of Educational Measurement, 2013

To validate an interpretation or use of test scores is to evaluate the plausibility of the claims based on the scores. An argument-based approach to validation suggests that the claims based on the test scores be outlined as an argument that specifies the inferences and supporting assumptions needed to get from test responses to score-based…

Descriptors: Test Interpretation, Validity, Scores, Test Use

Structured Constructs Models Based on Change-Point Analysis

Peer reviewed

Direct link

Shin, Hyo Jeong; Wilson, Mark; Choi, In-Hee – Journal of Educational Measurement, 2017

This study proposes a structured constructs model (SCM) to examine measurement in the context of a multidimensional learning progression (LP). The LP is assumed to have features that go beyond a typical multidimentional IRT model, in that there are hypothesized to be certain cross-dimensional linkages that correspond to requirements between the…

Descriptors: Middle School Students, Student Evaluation, Measurement Techniques, Learning Processes

Measuring Growth with Vertical Scales

Peer reviewed

Direct link

Briggs, Derek C. – Journal of Educational Measurement, 2013

A vertical score scale is needed to measure growth across multiple tests in terms of absolute changes in magnitude. Since the warrant for subsequent growth interpretations depends upon the assumption that the scale has interval properties, the validation of a vertical scale would seem to require methods for distinguishing interval scales from…

Descriptors: Measurement, Scaling, Validity, Test Interpretation

Relating Unidimensional IRT Parameters to a Multidimensional Response Space: A Review of Two Alternative Projection IRT Models for Scoring Subscales

Peer reviewed

Direct link

Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011

A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…

Descriptors: Test Length, Test Items, Alignment (Education), Models

Skills Diagnosis Using IRT-Based Latent Class Models

Peer reviewed

Direct link

Roussos, Louis A.; Templin, Jonathan L.; Henson, Robert A. – Journal of Educational Measurement, 2007

This article describes a latent trait approach to skills diagnosis based on a particular variety of latent class models that employ item response functions (IRFs) as in typical item response theory (IRT) models. To enable and encourage comparisons with other approaches, this description is provided in terms of the main components of any…

Descriptors: Validity, Identification, Psychometrics, Item Response Theory

Validation of Group Domain Score Estimates Using a Test of Domain

Peer reviewed

Direct link

Pommerich, Mary – Journal of Educational Measurement, 2006

Domain scores have been proposed as a user-friendly way of providing instructional feedback about examinees' skills. Domain performance typically cannot be measured directly; instead, scores must be estimated using available information. Simulation studies suggest that IRT-based methods yield accurate group domain score estimates. Because…

Descriptors: Test Validity, Scores, Simulation, Evaluation Methods

A Didactic Explanation of Item Bias, Item Impact, and Item Validity from a Multidimensional Perspective.

Peer reviewed

Ackerman, Terry A. – Journal of Educational Measurement, 1992

The difference between item bias and item impact and the way they relate to item validity are discussed from a multidimensional item response theory perspective. The Mantel-Haenszel procedure and the Simultaneous Item Bias strategy are used in a Monte Carlo study to illustrate detection of item bias. (SLD)

Descriptors: Causal Models, Computer Simulation, Construct Validity, Equations (Mathematics)

Previous Page | Next Page »

Pages: 1 | 2

Ackerman, Terry A.	2
Roussos, Louis A.	2
Young, John W.	2
Biancarosa, Gina	1
Braeken, Johan	1
Briggs, Derek C.	1
Carl Westine	1
Carlson, Sarah E.	1
Chen, Shyh-Huei	1
Cho, Uk Hyun	1
Choi, In-Hee	1
Cohen, Allan S.	1
Davison, Mark L.	1
DeCarlo, Lawrence T.	1
DeMars, Christine E.	1
Falk, Carl F.	1
He, Yinhong	1
Henson, Robert A.	1
Ip, Edward H.	1
Ju, Unhee	1
Kahraman, Nilufer	1
Kane, Michael T.	1
Kennedy, Patrick C.	1
Kim, Kyung Yong	1
Kim, Seock-Ho	1
More ▼