Publication Date
| In 2026 | 0 |
| Since 2025 | 2 |
| Since 2022 (last 5 years) | 12 |
| Since 2017 (last 10 years) | 26 |
| Since 2007 (last 20 years) | 90 |
Descriptor
| True Scores | 416 |
| Error of Measurement | 121 |
| Test Reliability | 110 |
| Statistical Analysis | 107 |
| Mathematical Models | 97 |
| Item Response Theory | 87 |
| Correlation | 76 |
| Equated Scores | 76 |
| Reliability | 64 |
| Test Theory | 52 |
| Test Items | 51 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 12 |
| Practitioners | 2 |
| Administrators | 1 |
| Teachers | 1 |
Location
| Australia | 1 |
| Canada | 1 |
| China | 1 |
| Colorado | 1 |
| Illinois | 1 |
| Israel | 1 |
| New York | 1 |
| Oregon | 1 |
| Taiwan | 1 |
| Texas | 1 |
| United Kingdom (England) | 1 |
| More ▼ | |
Laws, Policies, & Programs
| Elementary and Secondary… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
PDF pending restorationRee, Malcolm James – 1978
Item characteristic curve (ICC) theory describes the relationship between the ability of individuals and the probability of their answering a test question correctly; it is useful in estimating test scores, equating the scores of various tests, and scoring responses during adaptive testing. A simulation study of the effectiveness of the following…
Descriptors: Ability, Comparative Analysis, Computer Programs, Item Analysis
Douglass, James B. – 1981
Methods and results relevant to the introduction of item characteristic curve (ICC) models into classroom achievement testing are provided. The overall objective was to compare several common ICC models for item calibration and test equating in a classroom examination system. Parameters for the one-, two- and three-parameter logistic ICC models…
Descriptors: Academic Achievement, Comparative Analysis, Difficulty Level, Equated Scores
deGruijter, Dato N. M. – 1980
The setting of standards involves subjective value judgments. The inherent arbitrariness of specific standards has been severely criticized by Glass. His antagonists agree that standard setting is a judgmental task but they have pointed out that arbitrariness in the positive sense of serious judgmental decisions is unavoidable. Further, small…
Descriptors: Cutting Scores, Difficulty Level, Error of Measurement, Mastery Tests
Kolen, Michael J. – 1980
Results from equipercentile, linear, and latent trait equating of the vocabulary and quantitative thinking tests of the Iowa Tests of Educational Development were compared. The study entailed both the equating of forms (of similar difficulty) and the equating of levels (of differing difficulty). The goal was to equate seventh edition tests to…
Descriptors: Achievement Tests, Difficulty Level, Equated Scores, Guessing (Tests)
Gleser, Leon Jay – 1971
An attempt is made to indicate why the concept of "true score" naturally leads to the belief that test validity must increase with an increase in test and/or average item reliability, and why this is correct for the classical single-factor model first introduced by Spearman. The statistical model used by Loevinger is introduced to…
Descriptors: Factor Analysis, Item Analysis, Mathematical Models, Measurement Techniques
Kristof, Walter – 1971
We concern ourselves with the hypothesis that two variables have a perfect disattenuated correlation, hence measure the same trait except for errors of measurement. This hypothesis is equivalent to saying, within the adopted model, that true scores of two psychological tests satisfy a linear relation. Statistical tests of this hypothesis are…
Descriptors: Analysis of Covariance, Comparative Analysis, Correlation, Error of Measurement
Peer reviewedYen, Wendy M. – Journal of Educational Measurement, 1984
A procedure for obtaining maximum likelihood trait estimates from number-correct (NC) scores for the three-parameter logistic model is presented. It produces an NC score to trait estimate conversion table. Analyses in the estimated true score metric confirm the conclusions made in the trait metric. (Author/DWH)
Descriptors: Achievement Tests, Error of Measurement, Estimation (Mathematics), Latent Trait Theory
Peer reviewedFeldt, Leonard S.; Spray, Judith A. – Research Quarterly for Exercise and Sport, 1983
The reliabilities of two types of measurement plans were compared across six hypothetical distributions of true scores or abilities. The measurement plans were: (1) fixed-length, where the number of trials for all examinees is set in advance; and (2) trials-to-criterion, where examinees must keep trying until they complete a given number of trials…
Descriptors: Criterion Referenced Tests, Evaluation Methods, Higher Education, Measurement Techniques
De Champlain, Andre F. – 1995
The dimensionality of one form of the Law School Admission Test (LSAT) was assessed with respect to three ethnic groups of test takers. Whether differences in the ability composite have any noticeable impact on item response theory (IRT) true score equating results for these subgroups (African Americans, Hispanic Americans, and Whites) was also…
Descriptors: Ability, Blacks, Equated Scores, Ethnic Groups
Peer reviewedLivingston, Samuel A. – Journal of Educational Measurement, 1972
A reliability coefficient for criterion-referenced tests is developed from the assumptions of classical test theory. The coefficient is based on deviations of scores from the criterion score, rather than from the mean. (Author/CK)
Descriptors: Criterion Referenced Tests, Error of Measurement, Mathematical Applications, Norm Referenced Tests
Peer reviewedHarris, Chester W. – Journal of Educational Measurement, 1972
An alternative interpretation of Livingston's reliability coefficient (see TM 500 487) is based on the notion of the relation of the size of the reliability coefficient to the range of talent. (Author/CK)
Descriptors: Criterion Referenced Tests, Error of Measurement, Mathematical Applications, Norm Referenced Tests
McDonald, Roderick P. – Alberta Journal of Educational Research, 2003
The concept of a behavior domain is a reasonable and essential foundation for psychometric work based on true score theory, the linear model of common factor analysis, and the nonlinear models of item response theory. Investigators applying these models to test data generally treat the true scores or factors or traits as abstractive psychological…
Descriptors: Factor Analysis, Error of Measurement, True Scores, Psychometrics
Nugent, William R. – Educational and Psychological Measurement, 2006
One of the most important effect sizes used in meta-analysis is the standardized mean difference (SMD). In this article, the conditions under which SMD effect sizes based on different measures of the same construct are directly comparable are investigated. The results show that SMD effect sizes from different measures of the same construct are…
Descriptors: Effect Size, Meta Analysis, True Scores, Error of Measurement
PDF pending restorationZwick, Rebecca; And Others – 1994
A previous simulation study of methods for assessing item functioning (DIF) in computer-adaptive tests (CATs) showed that modified versions of the Mantel-Haenszel and standardization methods work well with CAT data. In that study, data were generated using the three-parameter logistic (3PL) model, and this same model was assumed in obtaining item…
Descriptors: Ability, Adaptive Testing, Computer Assisted Testing, Computer Simulation
Reckase, Mark D.; And Others – 1985
Factor analysis is the traditional method for studying the dimensionality of test data. However, under common conditions, the factor analysis of tetrachoric correlations does not recover the underlying structure of dichotomous data. The purpose of this paper is to demonstrate that the factor analyses of tetrachoric correlations is unlikely to…
Descriptors: Correlation, Difficulty Level, Factor Analysis, Item Analysis

Direct link
