Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 3 |
Descriptor
Test Validity | 17 |
Mathematical Models | 11 |
Error of Measurement | 7 |
Models | 6 |
Item Response Theory | 5 |
Test Items | 5 |
Test Reliability | 5 |
Item Analysis | 4 |
Sampling | 4 |
Testing Problems | 4 |
Criterion Referenced Tests | 3 |
More ▼ |
Source
Journal of Educational… | 17 |
Author
Wainer, Howard | 2 |
Ackerman, Terry A. | 1 |
Carl Westine | 1 |
Cohen, Allan S. | 1 |
DeMars, Christine E. | 1 |
Embretson, Susan | 1 |
Emrick, John A. | 1 |
Hambleton, Ronald K. | 1 |
Kahraman, Nilufer | 1 |
Kim, Seock-Ho | 1 |
Michelle Boyer | 1 |
More ▼ |
Publication Type
Journal Articles | 13 |
Reports - Research | 7 |
Reports - Evaluative | 6 |
Speeches/Meeting Papers | 1 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025
While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…
Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity
Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011
A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…
Descriptors: Test Length, Test Items, Alignment (Education), Models

Ackerman, Terry A. – Journal of Educational Measurement, 1992
The difference between item bias and item impact and the way they relate to item validity are discussed from a multidimensional item response theory perspective. The Mantel-Haenszel procedure and the Simultaneous Item Bias strategy are used in a Monte Carlo study to illustrate detection of item bias. (SLD)
Descriptors: Causal Models, Computer Simulation, Construct Validity, Equations (Mathematics)

Sawyer, Richard – Journal of Educational Measurement, 1996
Decision theory is a useful method for assessing the effectiveness of the components of a course placement system. The effectiveness of placement tests or other variables in identifying underprepared students is described by the conditional probability of success in a standard course. Estimating the conditional probability of success is discussed.…
Descriptors: College Students, Estimation (Mathematics), Higher Education, Mathematical Models
Wise, Steven L.; DeMars, Christine E. – Journal of Educational Measurement, 2006
The validity of inferences based on achievement test scores is dependent on the amount of effort that examinees put forth while taking the test. With low-stakes tests, for which this problem is particularly prevalent, there is a consequent need for psychometric models that can take into account differing levels of examinee effort. This article…
Descriptors: Guessing (Tests), Psychometrics, Inferences, Reaction Time

Wardrop, James L.; And Others – Journal of Educational Measurement, 1982
A structure for describing different approaches to testing is generated by identifying five dimensions along which tests differ: test uses, item generation, item revision, assessment of precision, and validation. These dimensions are used to profile tests of reading comprehension. Only norm-referenced achievement tests had an inference system…
Descriptors: Achievement Tests, Comparative Analysis, Educational Testing, Models

Whitely, Susan E. – Journal of Educational Measurement, 1977
A debate concerning specific issues and the general usefulness of the Rasch latent trait test model is continued. Methods of estimation, necessary sample size, and the applicability of the model are discussed. (JKS)
Descriptors: Error of Measurement, Item Analysis, Mathematical Models, Measurement

Woodruff, David – Journal of Educational Measurement, 1990
A method of estimating conditional standard error of measurement at specific score/ability levels is described that avoids theoretical problems identified for previous methods. The method focuses on variance of observed scores conditional on a fixed value of an observed parallel measurement, decomposing these variances into true and error parts.…
Descriptors: Error of Measurement, Estimation (Mathematics), Mathematical Models, Predictive Measurement

Wright, Benjamin D. – Journal of Educational Measurement, 1977
Statements made in a previous article of this journal concerning the Rasch latent trait test model are questioned. Methods of estimation, necessary sample sizes, several formuli, and the general usefulness of the Rasch model are discussed. (JKS)
Descriptors: Computers, Error of Measurement, Item Analysis, Mathematical Models

Hambleton, Ronald K.; Novick, Melvin R. – Journal of Educational Measurement, 1973
In this paper, an attempt has been made to synthesize some of the current thinking in the area of criterion-referenced testing as well as to provide the beginning of an integration of theory and method for such testing. (Editor)
Descriptors: Bayesian Statistics, Criterion Referenced Tests, Decision Making, Definitions

Emrick, John A. – Journal of Educational Measurement, 1971
Descriptors: Criterion Referenced Tests, Error of Measurement, Evaluation Methods, Item Analysis

Kim, Seock-Ho; Cohen, Allan S. – Journal of Educational Measurement, 1992
Effects of the following methods for linking metrics on detection of differential item functioning (DIF) were compared: (1) test characteristic curve method (TCC); (2) weighted mean and sigma method; and (3) minimum chi-square method. With large samples, results were essentially the same. With small samples, TCC was most accurate. (SLD)
Descriptors: Chi Square, Comparative Analysis, Equations (Mathematics), Estimation (Mathematics)

Embretson, Susan; And Others – Journal of Educational Measurement, 1986
This study examined the influence of processing strategies, and the metacomponents that determine when to apply them, on the construct validity of a verbal reasoning test. A rule-oriented strategy, an association strategy, and a partial rule strategy were examined. All three strategies contributed to individual differences in verbal reasoning.…
Descriptors: Cognitive Processes, Elementary Secondary Education, Error of Measurement, Latent Trait Theory

Wainer, Howard; And Others – Journal of Educational Measurement, 1991
Hierarchical (adaptive) and linear methods of testlet construction were compared. The performance of 2,080 ninth and tenth graders on a 4-item testlet was used to predict performance on the entire test. The adaptive test was slightly superior as a predictor, but the cost of obtaining that superiority was considerable. (SLD)
Descriptors: Adaptive Testing, Algebra, Comparative Testing, High School Students

Wainer, Howard – Journal of Educational Measurement, 1986
Describes recent research attempts to draw inferences about the relative standing of the states on the basis of mean SAT scores. This paper identifies five serious errors that call into question the validity of such inferences. Some plausible ways to avoid the errors are described. (Author/LMO)
Descriptors: College Entrance Examinations, Equated Scores, Mathematical Models, Predictor Variables
Previous Page | Next Page ยป
Pages: 1 | 2