Publication Date
In 2025 | 1 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 11 |
Since 2016 (last 10 years) | 30 |
Since 2006 (last 20 years) | 95 |
Descriptor
True Scores | 415 |
Error of Measurement | 121 |
Test Reliability | 110 |
Statistical Analysis | 107 |
Mathematical Models | 97 |
Item Response Theory | 87 |
Correlation | 76 |
Equated Scores | 76 |
Reliability | 64 |
Test Theory | 52 |
Test Items | 50 |
More ▼ |
Source
Author
Publication Type
Education Level
Audience
Researchers | 12 |
Practitioners | 2 |
Administrators | 1 |
Teachers | 1 |
Location
Australia | 1 |
Canada | 1 |
China | 1 |
Colorado | 1 |
Illinois | 1 |
Israel | 1 |
New York | 1 |
Oregon | 1 |
Taiwan | 1 |
Texas | 1 |
United Kingdom (England) | 1 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating

Kristof, Walter – Psychometrika, 1974
Descriptors: Models, Statistical Analysis, Test Reliability, Testing

Nicewander, W. Alan – Psychometrika, 1975
Shows that the Harris factors of R have psychometric properties similar to those discussed by Kaiser and Caffrey (1965) and Bentler (1968). Specifically it is shown that the Harris factors of R maximize a lower-bound to the reliability of a composite measure derived by Guttman (1945). (Author/RC)
Descriptors: Correlation, Factor Analysis, Matrices, Prediction

Marks, Edmond; Lindsay, Carl A. – Journal of Educational Measurement, 1972
Examines the effects of four parameters on the accuracy of test equating under a relaxed definition of test form equivalence. The four parameters studied were sample size, test form length, test form reliability, and the correlation between true scores of the test forms to be equated. (CK)
Descriptors: Scores, Test Interpretation, Test Reliability, Test Results

Ramsay, J. O. – Educational and Psychological Measurement, 1971
The consequences of the assumption that the expected score is equal to the true score are shown and alternatives discussed. (MS)
Descriptors: Psychological Testing, Statistical Analysis, Test Reliability, Testing

Bowers, John – Educational and Psychological Measurement, 1971
Descriptors: Error of Measurement, Mathematical Models, Test Reliability, True Scores

Bond, Lloyd – Psychometrika, 1979
Tucker, Damarin, and Messick proposed a "base-free" measure of change which involves the computation of residual scores that are uncorrelated with true scores on the pretest. The present note discusses this change measure and demonstrates that properties they attribute to a are, in fact, properties of b. (Author/CTM)
Descriptors: Differences, Pretests Posttests, Research Reviews (Publications), Scores

Conger, Anthony J. – Educational and Psychological Measurement, 1980
Reliability maximizing weights are related to theoretically specified true score scaling weights to show a constant relationship that is invariant under separate linear tranformations on each variable in the system. Test theoretic relations should be derived for the most general model available and not for unnecessarily constrained models.…
Descriptors: Mathematical Formulas, Scaling, Test Reliability, Test Theory

Wilcox, Rand R. – Applied Psychological Measurement, 1979
Using a new coefficient, a rescaling of the Bayes risk is examined and a modification of this coefficient is described which yields an index that always has a value between zero and one. (Author/MH)
Descriptors: Bayesian Statistics, Measurement Techniques, Scoring, Technical Reports

Dimitrov, Dimiter M. – Journal of Applied Measurement, 2003
Proposes formulas for expected true-score measures and reliability of binary items as a function of their Rasch difficulty when the trait (ability) distribution is normal or logistic. Provides an illustrative example for using the proposed formulas. (SLD)
Descriptors: Ability, Difficulty Level, Item Response Theory, Reliability

Tisak, John; Tisak, Marie S. – Applied Psychological Measurement, 1996
Dynamic generalizations of reliability and validity that will incorporate longitudinal or developmental models, using latent curve analysis, are discussed. A latent curve model formulated to depict change is incorporated into the classical definitions of reliability and validity. The approach is illustrated with sociological and psychological…
Descriptors: Definitions, Development, Longitudinal Studies, Models

Cliff, Norman – Psychometrika, 1989
This paper argues that: test data are ordinal; latent trait scores are only determined ordinally; and test data are used largely for ordinal purposes. A set of ordinal assumptions is presented, including an ordinal version of local independence. It is concluded that a purely ordinal test theory is possible. (TJH)
Descriptors: Equations (Mathematics), Latent Trait Theory, Regression (Statistics), True Scores

Krus, David J.; Helmstadter, Gerald C. – Educational and Psychological Measurement, 1993
Negative coefficients of reliability, sometimes returned by the standard formula for estimation of the internal-consistency reliability, are neither theoretically nor numerically correct. Alternative strategies for test development in this special case are suggested. (Author)
Descriptors: Estimation (Mathematics), Reliability, Test Construction, Test Use

Jiang, Hai; Stout, William – Journal of Educational and Behavioral Statistics, 1998
Proposes a new regression correction for the SIBTEST statistical tests (R. Shealy and W. Stout, 1993) that essentially uses a two-segment piecewise linear regression of the true on observed matching subtest scores. A simulation study illustrates the approach. (SLD)
Descriptors: Estimation (Mathematics), Item Bias, Regression (Statistics), Simulation
Stone, Gregory Ethan; Beltyukova, Svetlana; Fox, Christine M. – International Journal of Testing, 2008
Judge-mediated examinations are defined as those for which expert evaluation (using rubrics) is required to determine correctness, completeness, and reasonability of test-taker responses. The use of multifaceted Rasch modeling has led to improvements in the reliability of scoring such examinations. The establishment of criterion-referenced…
Descriptors: Interrater Reliability, High Stakes Tests, Standard Setting, Minimum Competencies
Stocking, Martha L.; And Others – 1988
A sequence of simulations was carried out to aid in the diagnosis and interpretation of equating differences found between random and matched (nonrandom) samples for four commonly used equating procedures: (1) Tucker linear observed-score equating; (2) Levine equally reliable linear observed-score equating; (3) equipercentile curvilinear…
Descriptors: Equated Scores, Item Response Theory, Sample Size, Simulation