Publication Date
In 2025 | 3 |
Since 2024 | 12 |
Since 2021 (last 5 years) | 41 |
Since 2016 (last 10 years) | 126 |
Since 2006 (last 20 years) | 395 |
Descriptor
Test Theory | 1161 |
Test Items | 261 |
Test Reliability | 252 |
Test Construction | 245 |
Test Validity | 245 |
Psychometrics | 181 |
Scores | 176 |
Item Response Theory | 165 |
Foreign Countries | 159 |
Item Analysis | 141 |
Statistical Analysis | 134 |
More ▼ |
Source
Author
Publication Type
Education Level
Location
United States | 17 |
United Kingdom (England) | 15 |
Canada | 14 |
Australia | 13 |
Turkey | 12 |
Sweden | 8 |
United Kingdom | 8 |
Netherlands | 7 |
Texas | 7 |
New York | 6 |
Taiwan | 6 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 4 |
Elementary and Secondary… | 3 |
Individuals with Disabilities… | 3 |
Assessments and Surveys
What Works Clearinghouse Rating
Rogosa, David – 2000
In the reporting of individual student results from standardized tests in educational assessments, the percentile rank of the individual student is a major numerical indicator. For example, in the 1998 and 1999 California Standardized Testing and Reporting (STAR) program using the Stanford Achievement Test Series, Ninth Edition, Form T (Stanford…
Descriptors: Comparative Analysis, Elementary Secondary Education, Standardized Tests, Tables (Data)

Adler, Nurit; Guttman, Ruth – Educational and Psychological Measurement, 1982
Thirteen ability tests were administered as defined within a mapping sentence containing four content facets: rule type, expression mode, language of communication and dimensionality of portrayed object. Smallest Space Analysis of intercorrelations among test scores showed the radex structure of the two-dimensional space conformed to the…
Descriptors: Content Analysis, Factor Structure, Intelligence Tests, Scores
Jarjoura, David; Brennan, Robert L. – New Directions for Testing and Measurement, 1983
Multivariate generalizability techniques are used to bridge the gap between psychometric constraints and the tables of specifications needed in test development. Techniques are illustrated with results from the American College Testing Assessment Program. (Author/PN)
Descriptors: Data Analysis, Mathematical Models, Multivariate Analysis, Test Construction

Williams, Richard H.; Zimmerman, Donald W. – Journal of Experimental Education, 1980
It is suggested that error of measurement cannot be routinely incorporated into the "error term" in statistical tests, and that the reliability of test scores does not have the simple relationship to statistical inference that one might expect. (Author/GK)
Descriptors: Error of Measurement, Hypothesis Testing, Mathematical Formulas, Test Reliability

Green, Bert F. – American Psychologist, 1981
Discusses classical test theory, including test construction, administration, and use. Covers basic statistical concepts in measurement, reliability, and validity; principles of sound test construction and item analysis; test administration and scoring; procedures for transforming raw test data into scaled scores; and future prospects in test…
Descriptors: Scores, Statistics, Test Construction, Test Interpretation

Irvine, S.H.; Reuning, H. – Journal of Cross-Cultural Psychology, 1981
Under experimental conditions, Canadian students were subjected to group tests on simple cognitive tasks constructed on the basis of previous theoretical work on perceptual speed. Cross-cultural replications suggest that such tests can be validated within and across cultures. (Author/MJL)
Descriptors: Adolescents, Cognitive Tests, Cross Cultural Studies, Perception

Zumbo, Bruno D.; Pope, Gregory A.; Watson, Jackie E.; Hubley, Anita M. – Educational and Psychological Measurement, 1997
E. Roskam's (1985) conjecture that steeper item characteristic curve (ICC) "a" parameters (slopes) (and higher item total correlations in classical test theory) would be found with more concretely worded test items was tested with results from 925 young adults on the Eysenck Personality Questionnaire (H. Eysenck and S. Eysenck, 1975).…
Descriptors: Correlation, Personality Assessment, Personality Measures, Test Interpretation

Chase, Clint – Mid-Western Educational Researcher, 1996
Classical procedures for calculating the two indices of decision consistency (P and Kappa) for criterion-referenced tests require two testings on each child. Huynh, Peng, and Subkoviak have presented one-testing procedures for these indices. These indices can be estimated without any test administration using Ebel's estimates of the mean, standard…
Descriptors: Criterion Referenced Tests, Educational Research, Educational Testing, Estimation (Mathematics)

Embretson, Susan E. – Applied Psychological Measurement, 1996
Conditions under which interaction effects estimated from classical total scores, rather than item response theory trait scores, can be misleading are discussed with reference to analysis of variance (ANOVA). When no interaction effects exist on the true latent variable, spurious interaction effects can be observed from the total score scale. (SLD)
Descriptors: Analysis of Variance, Interaction, Item Response Theory, Models

Humphreys, Lloyd G. – Applied Psychological Measurement, 1996
The reliability of a gain is determined by the reliabilities of the components, the correlation between them, and their standard deviations. Reliability is not inherently low, but the components of gains in many investigations make low reliability likely and require caution in the use of gain scores. (SLD)
Descriptors: Achievement Gains, Change, Correlation, Error of Measurement

Williams, Richard H.; Zimmerman, Donald W. – Applied Psychological Measurement, 1996
The critiques by L. Collins and L. Humphreys in this issue illustrate problems with the use of gain scores. Collins' examples show that familiar formulas for the reliability of differences do not reflect the precision of measures of change. Additional examples demonstrate flaws in the conventional approach to reliability. (SLD)
Descriptors: Achievement Gains, Change, Correlation, Error of Measurement

Fan, Xitao – Educational and Psychological Measurement, 1998
This study empirically examined the behaviors of item and person statistics derived from item response theory and classical test theory, focusing on item and person statistics and using a large-scale statewide assessment. Findings show that the person and item statistics from the two measurement frameworks are quite comparable. (SLD)
Descriptors: Item Response Theory, State Programs, Statistical Analysis, Test Items

Banerji, Madhabi – Journal of Applied Measurement, 2000
Validated data from a developmental mathematics assessment using classical and three-faceted Rasch measurement methods. Analysis of field test data for 289 elementary school students suggested that a unidimensional construct was being measured, as defined by Rasch criteria. Discusses limitations in confirming content-related validity of the…
Descriptors: Construct Validity, Content Validity, Elementary Education, Elementary School Students

Traub, Ross E. – Educational Measurement: Issues and Practice, 1997
Classical test theory is founded on the proposition that measurement error, a random latent variable, is a component of the observed score random variable. This article traces the history of the development of classical test theory, beginning in the early 20th century. (SLD)
Descriptors: Educational History, Educational Testing, Error of Measurement, Psychometrics

Beland, Anne; Mislevy, Robert J. – Journal of Educational Measurement, 1996
This article addresses issues in model building and statistical inference in the context of student modeling. The use of probability-based reasoning to explicate hypothesized and empirical relationships and to structure inference in the context of proportional reasoning tasks is discussed. Ideas are illustrated with an example concerning…
Descriptors: Cognitive Psychology, Models, Networks, Probability