Publication Date
| In 2026 | 7 |
| Since 2025 | 690 |
| Since 2022 (last 5 years) | 3191 |
| Since 2017 (last 10 years) | 7432 |
| Since 2007 (last 20 years) | 15070 |
Descriptor
| Test Reliability | 15055 |
| Test Validity | 10290 |
| Reliability | 9763 |
| Foreign Countries | 7150 |
| Test Construction | 4828 |
| Validity | 4192 |
| Measures (Individuals) | 3880 |
| Factor Analysis | 3826 |
| Psychometrics | 3532 |
| Interrater Reliability | 3126 |
| Correlation | 3040 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1329 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 253 |
| Taiwan | 234 |
| Netherlands | 224 |
| Spain | 218 |
| California | 215 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Peer reviewedPuhl, Jackie – Perceptual and Motor Skills, 1980
The effectiveness of using video replays to judge gymnastics vaulting was examined by determining reliability of judges' scores and comparing mean scores for original (live) vaults and replays. Statistical analyses suggest that variability of scores was greater after the live vault, and that judges' reliability may be enhanced by replays.…
Descriptors: Adults, Gymnastics, Judges, Observation
Peer reviewedFeldt, Leonard S. – Psychometrika, 1980
Procedures are developed for testing the hypothesis that Cronbach's alpha reliability coefficient is equal for two tests given to the same subjects. (Author/JKS)
Descriptors: Error of Measurement, Hypothesis Testing, Measurement, Statistical Significance
Peer reviewedWilcox, Rand R. – Educational and Psychological Measurement, 1979
The classical estimate of a binomial probability function is to estimate its mean in the usual manner and to substitute the results in the appropriate expression. Two alternative estimation procedures are described and examined. Emphasis is given to the single administration estimate of the mastery test reliability. (Author/CTM)
Descriptors: Cutting Scores, Mastery Tests, Probability, Scores
Peer reviewedCudeck, Robert – Journal of Educational Measurement, 1980
Methods for evaluating the consistency of responses to test items were compared. When a researcher is unwilling to make the assumptions of classical test theory, has only a small number of items, or is in a tailored testing context, Cliff's dominance indices may be useful. (Author/CTM)
Descriptors: Error Patterns, Item Analysis, Test Items, Test Reliability
Peer reviewedBrennan, Robert L.; Lockwood, Robert E. – Applied Psychological Measurement, 1980
Generalizability theory is used to characterize and quantify expected variance in cutting scores and to compare the Nedelsky and Angoff procedures for establishing a cutting score. Results suggest that the restricted nature of the Nedelsky (inferred) probability scale may limit its applicability in certain contexts. (Author/BW)
Descriptors: Cutting Scores, Generalization, Statistical Analysis, Test Reliability
Peer reviewedWakefield, James A., Jr. – Educational and Psychological Measurement, 1980
Studies in applied behavior analysis have used two expressions of reliability for human observations: percentage agreement and correlational techniques (including the phi coefficient). Formulas for converting percentage agreement scores to phi coefficients and vice versa are presented. (Author/RL)
Descriptors: Behavioral Science Research, Comparative Analysis, Correlation, Mathematical Formulas
Peer reviewedFox, Robert A. – Journal of School Health, 1980
Some practical guidelines for developing multiple choice tests are offered. Included are three steps: (1) test design; (2) proper construction of test items; and (3) item analysis and evaluation. (JMF)
Descriptors: Guidelines, Objective Tests, Planning, Test Construction
Aiken, Lewis R. – New Directions for Testing and Measurement, 1980
A comprehensive overview of the beginnings of attitude measurement is merged with a discussion of recent developments. A synthesis of new research on technical issues related to the reliability and validity of attitude measures and contemporary views on attitude formation and change are presented. (Author)
Descriptors: Attitude Change, Attitude Measures, Test Reliability, Test Validity
Peer reviewedAsher, Steven R.; And Others – Developmental Psychology, 1979
Examined the test-retest reliability of a rating-scale sociometric technique with four-year-old children. Results showed test-retest reliability over a four-week interval was high compared to the stability of the traditional positive and negative nomination scores. (JMB)
Descriptors: Preschool Children, Preschool Education, Rating Scales, Sociometric Techniques
Peer reviewedHuynh, Huynh – Journal of Educational Measurement, 1976
Within the beta-binomial Bayesian framework, procedures are described for the evaluation of the kappa index of reliability on the basis of one administration of a domain-referenced test. Major factors affecting this index include cutoff score, test score variability and test length. Empirical data which substantiate some theoretical trends deduced…
Descriptors: Criterion Referenced Tests, Decision Making, Mathematical Models, Probability
Peer reviewedSubkoviak, Michael J. – Journal of Educational Measurement, 1976
A number of different reliability coefficients have recently been proposed for tests used to differentiate between groups such as masters and nonmasters. One promising index is the proportion of students in a class that are consistently assigned to the same mastery group across two testings. The present paper proposes a single test administration…
Descriptors: Criterion Referenced Tests, Mastery Tests, Mathematical Models, Probability
Kazelskis, Richard – Southern Journal of Educational Research, 1977
Estimates of the internal consistency and reliability of the first principal component are provided through the use of the largest characteristic root and associated vector of the equicorrelation matrix. The estimate of the internal consistency is also shown to be a lower bound for the measure provided by Horn (1969). (Author)
Descriptors: Correlation, Equated Scores, Factor Analysis, Matrices
Peer reviewedLovett, Hubert T. – Educational and Psychological Measurement, 1977
The analysis of variance model for estimating reliability in norm referenced tests is extended to criterion referenced tests. The essential modification is that the criterion or cut-off score is substituted for the population mean. An example and discussion are presented. (JKS)
Descriptors: Analysis of Variance, Criterion Referenced Tests, Cutting Scores, Test Reliability
Peer reviewedMay, Kim O.; Nicewander, W. Alan – Journal of Educational Measurement, 1997
Dato de Gruijter is correct in the recent conclusion that one equation derived by the present authors should be changed to reflect that it is an approximation, but it is still argued that percentile ranks for difficult tests can have substantially lower reliability and information relative to their number correct scores holds. (SLD)
Descriptors: Equations (Mathematics), Estimation (Mathematics), Raw Scores, Reliability
Peer reviewedFeldt, Leonard S. – Applied Measurement in Education, 1997
It has often been asserted that the reliability of a measure places an upper limit on its validity. This article demonstrates in theory that validity can rise when reliability declines, even when validity evidence is a correlation with an acceptable criterion. Whether empirical examples can actually be found is an open question. (SLD)
Descriptors: Correlation, Criteria, Reliability, Test Construction


