ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	3

Descriptor

Test Validity	17
Mathematical Models	11
Error of Measurement	7
Models	6
Item Response Theory	5
Test Items	5
Test Reliability	5
Item Analysis	4
Sampling	4
Testing Problems	4
Criterion Referenced Tests	3
Estimation (Mathematics)	3
Psychometrics	3
Scores	3
Test Construction	3
Test Interpretation	3
Achievement Tests	2
Comparative Analysis	2
Equated Scores	2
Equations (Mathematics)	2
Evaluation Methods	2
Higher Education	2
Item Bias	2
Measurement	2
Norm Referenced Tests	2
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	13
Reports - Research	7
Reports - Evaluative	6
Speeches/Meeting Papers	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)

What Works Clearinghouse Rating

Showing 1 to 15 of 17 results Save | Export

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

Relating Unidimensional IRT Parameters to a Multidimensional Response Space: A Review of Two Alternative Projection IRT Models for Scoring Subscales

Peer reviewed

Direct link

Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011

A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…

Descriptors: Test Length, Test Items, Alignment (Education), Models

A Didactic Explanation of Item Bias, Item Impact, and Item Validity from a Multidimensional Perspective.

Peer reviewed

Ackerman, Terry A. – Journal of Educational Measurement, 1992

The difference between item bias and item impact and the way they relate to item validity are discussed from a multidimensional item response theory perspective. The Mantel-Haenszel procedure and the Simultaneous Item Bias strategy are used in a Monte Carlo study to illustrate detection of item bias. (SLD)

Descriptors: Causal Models, Computer Simulation, Construct Validity, Equations (Mathematics)

Decision Theory Models for Validating Course Placement Tests.

Peer reviewed

Sawyer, Richard – Journal of Educational Measurement, 1996

Decision theory is a useful method for assessing the effectiveness of the components of a course placement system. The effectiveness of placement tests or other variables in identifying underprepared students is described by the conditional probability of success in a standard course. Estimating the conditional probability of success is discussed.…

Descriptors: College Students, Estimation (Mathematics), Higher Education, Mathematical Models

An Application of Item Response Time: The Effort-Moderated IRT Model

Peer reviewed

Direct link

Wise, Steven L.; DeMars, Christine E. – Journal of Educational Measurement, 2006

The validity of inferences based on achievement test scores is dependent on the amount of effort that examinees put forth while taking the test. With low-stakes tests, for which this problem is particularly prevalent, there is a consequent need for psychometric models that can take into account differing levels of examinee effort. This article…

Descriptors: Guessing (Tests), Psychometrics, Inferences, Reaction Time

A Framework for Analyzing the Inference Structure of Educational Achievement Tests.

Peer reviewed

Wardrop, James L.; And Others – Journal of Educational Measurement, 1982

A structure for describing different approaches to testing is generated by identifying five dimensions along which tests differ: test uses, item generation, item revision, assessment of precision, and validation. These dimensions are used to profile tests of reading comprehension. Only norm-referenced achievement tests had an inference system…

Descriptors: Achievement Tests, Comparative Analysis, Educational Testing, Models

Models, Meanings and Misunderstandings: Some Issues in Applying Rasch's Theory

Peer reviewed

Whitely, Susan E. – Journal of Educational Measurement, 1977

A debate concerning specific issues and the general usefulness of the Rasch latent trait test model is continued. Methods of estimation, necessary sample size, and the applicability of the model are discussed. (JKS)

Descriptors: Error of Measurement, Item Analysis, Mathematical Models, Measurement

Conditional Standard Error of Measurement in Prediction.

Peer reviewed

Woodruff, David – Journal of Educational Measurement, 1990

A method of estimating conditional standard error of measurement at specific score/ability levels is described that avoids theoretical problems identified for previous methods. The method focuses on variance of observed scores conditional on a fixed value of an observed parallel measurement, decomposing these variances into true and error parts.…

Descriptors: Error of Measurement, Estimation (Mathematics), Mathematical Models, Predictive Measurement

Misunderstanding the Rasch Model

Peer reviewed

Wright, Benjamin D. – Journal of Educational Measurement, 1977

Statements made in a previous article of this journal concerning the Rasch latent trait test model are questioned. Methods of estimation, necessary sample sizes, several formuli, and the general usefulness of the Rasch model are discussed. (JKS)

Descriptors: Computers, Error of Measurement, Item Analysis, Mathematical Models

Toward an Integration of Theory and Method for Criterion-Referenced Tests

Peer reviewed

Hambleton, Ronald K.; Novick, Melvin R. – Journal of Educational Measurement, 1973

In this paper, an attempt has been made to synthesize some of the current thinking in the area of criterion-referenced testing as well as to provide the beginning of an integration of theory and method for such testing. (Editor)

Descriptors: Bayesian Statistics, Criterion Referenced Tests, Decision Making, Definitions

An Evaluation Model for Mastery Testing

Peer reviewed

Emrick, John A. – Journal of Educational Measurement, 1971

Descriptors: Criterion Referenced Tests, Error of Measurement, Evaluation Methods, Item Analysis

Effects of Linking Methods on Detection of DIF.

Peer reviewed

Kim, Seock-Ho; Cohen, Allan S. – Journal of Educational Measurement, 1992

Effects of the following methods for linking metrics on detection of differential item functioning (DIF) were compared: (1) test characteristic curve method (TCC); (2) weighted mean and sigma method; and (3) minimum chi-square method. With large samples, results were essentially the same. With small samples, TCC was most accurate. (SLD)

Descriptors: Chi Square, Comparative Analysis, Equations (Mathematics), Estimation (Mathematics)

Multiple Processing Strategies and the Construct Validity of Verbal Reasoning Tests.

Peer reviewed

Embretson, Susan; And Others – Journal of Educational Measurement, 1986

This study examined the influence of processing strategies, and the metacomponents that determine when to apply them, on the construct validity of a verbal reasoning test. A rule-oriented strategy, an association strategy, and a partial rule strategy were examined. All three strategies contributed to individual differences in verbal reasoning.…

Descriptors: Cognitive Processes, Elementary Secondary Education, Error of Measurement, Latent Trait Theory

Building Algebra Testlets: A Comparison of Hierarchical and Linear Structures.

Peer reviewed

Wainer, Howard; And Others – Journal of Educational Measurement, 1991

Hierarchical (adaptive) and linear methods of testlet construction were compared. The performance of 2,080 ninth and tenth graders on a 4-item testlet was used to predict performance on the entire test. The adaptive test was slightly superior as a predictor, but the cost of obtaining that superiority was considerable. (SLD)

Descriptors: Adaptive Testing, Algebra, Comparative Testing, High School Students

Five Pitfalls Encountered While Trying to Compare States on Their SAT Scores.

Peer reviewed

Wainer, Howard – Journal of Educational Measurement, 1986

Describes recent research attempts to draw inferences about the relative standing of the states on the basis of mean SAT scores. This paper identifies five serious errors that call into question the validity of such inferences. Some plausible ways to avoid the errors are described. (Author/LMO)

Descriptors: College Entrance Examinations, Equated Scores, Mathematical Models, Predictor Variables

Previous Page | Next Page »

Pages: 1 | 2

Wainer, Howard	2
Ackerman, Terry A.	1
Carl Westine	1
Cohen, Allan S.	1
DeMars, Christine E.	1
Embretson, Susan	1
Emrick, John A.	1
Hambleton, Ronald K.	1
Kahraman, Nilufer	1
Kim, Seock-Ho	1
Michelle Boyer	1
Novick, Melvin R.	1
Sawyer, Richard	1
Secolsky, Charles	1
Shavelson, Richard J.	1
Stella Y. Kim	1
Thompson, Tony	1
Tong Wu	1
Wardrop, James L.	1
Whitely, Susan E.	1
Wise, Steven L.	1
Woodruff, David	1
Wright, Benjamin D.	1
More ▼