Publication Date
| In 2026 | 7 |
| Since 2025 | 690 |
| Since 2022 (last 5 years) | 3191 |
| Since 2017 (last 10 years) | 7432 |
| Since 2007 (last 20 years) | 15070 |
Descriptor
| Test Reliability | 15055 |
| Test Validity | 10290 |
| Reliability | 9763 |
| Foreign Countries | 7150 |
| Test Construction | 4828 |
| Validity | 4192 |
| Measures (Individuals) | 3880 |
| Factor Analysis | 3826 |
| Psychometrics | 3532 |
| Interrater Reliability | 3126 |
| Correlation | 3040 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1329 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 253 |
| Taiwan | 234 |
| Netherlands | 224 |
| Spain | 218 |
| California | 215 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Bardo, John W.; Graney, Marshall J. – Southern Journal of Educational Research, 1979
Investigating use of maximum v averaged scores in physical and motor multiple-trial tests as indicators of performance, this article concludes use of mean scores is still most appropriate in terms of scientific estimations of true performance given multiple fallible empirical measures. (JC)
Descriptors: Performance, Psychomotor Skills, Reliability, Scores
Peer reviewedBradley, John M.; And Others – Journal of Reading Behavior, 1978
The present study was designed to determine if maze tests constructed over the same passages by different teachers were comparable. In addition, maze test parallel form reliability was investigated. (HOD)
Descriptors: Educational Research, Reading Comprehension, Reading Tests, Test Reliability
Peer reviewedHenggeler, Scott W.; Tavormina, Joseph B. – Hispanic Journal of Behavioral Sciences, 1979
The one-year stabilities of several well-standardized intellectual, educational, and personality tests were evaluated for 15 children of Mexican American migrant workers. Most of the stability coefficients observed for these tests were statistically significant and similar to those reported for their normative samples. However, the stability…
Descriptors: Mexican Americans, Migrant Children, Psychological Testing, Test Reliability
Peer reviewedJackson, Paul H. – Psychometrika, 1979
Use of the same term "split-half" for division of an n-item test into two subtests containing equal (Cronbach), and possibly unequal (Guttman), numbers of items sometimes leads to a misunderstanding about the relation between Guttman's maximum split-half bound and Cronbach's coefficient alpha. This distinction is clarified. (Author/JKS)
Descriptors: Item Analysis, Mathematical Formulas, Technical Reports, Test Reliability
Peer reviewedReynolds, Cecil R. – Psychology in the Schools, 1979
Two doctoral level school psychologists independently scored 50 McCarthy drawing booklets. Children producing the drawings ranged from 5-11. Interscorer reliability for Draw-A-Design was .93 and for Draw-A-Child was .96. No significant differences occurred in the mean score for either test across scores. (Author)
Descriptors: Children, Elementary Education, Scoring, Test Reliability
Peer reviewedPitts, Steven C.; And Others – Evaluation and Program Planning, 1996
An introduction is provided to the use of confirmatory factor analysis to test measurement invariance and stability in longitudinal research. The approach is illustrated through examples representing one or two constructs in one to three measurement waves. Basic issues in establishing measurement invariance are discussed. (SLD)
Descriptors: Evaluation Research, Longitudinal Studies, Measurement Techniques, Models
Peer reviewedTrimble, Douglas E. – Educational and Psychological Measurement, 1997
Studies of the reliability and validity of scores on the Religious Orientation Scale (G. Allport and J. Ross, 1967) were reviewed with respect to social desirability. Meta analysis shows that one scale correlates with social desirability, but another does not, suggesting that partialing out this variance is not recommended. (SLD)
Descriptors: Correlation, Meta Analysis, Reliability, Scores
Peer reviewedWolfe, Edward W.; Nogle, Sally – Journal of Applied Measurement, 2002
Developed and validated an instrument designed to measure the perceived measurability and importance of the National Athletic Trainers' Association Athletic Training Educational Competencies. Data from 931 athletic trainers and sport medicine physicians support 6 constructs, each of which demonstrates high reliability. (SLD)
Descriptors: Athletics, Competence, Criteria, Measurement Techniques
Peer reviewedReese Robert J.; Kieffer, Kevin M.; Briggs, Barbara K. – Educational and Psychological Measurement, 2002
Conducted a reliability generalization study of five of the most prominent adult attachment style measures. Results from this investigation of 154 previously published stories indicate that the average score reliabilities across studied varied considerably across instruments and subscales. (SLD)
Descriptors: Adults, Attachment Behavior, Generalization, Meta Analysis
Peer reviewedHenson, Robin K.; Hwang, Dae-Yeop – Educational and Psychological Measurement, 2002
Conducted a reliability generalization study of Kolb's Learning Style Inventory (LSI; D. Kolb, 1976). Results for 34 studies indicate that internal consistency and test-retest reliabilities for LSI scores fluctuate considerably and contribute to deleterious cumulative measurement error. (SLD)
Descriptors: Error of Measurement, Generalization, Meta Analysis, Reliability
Peer reviewedDimitrov, Dimiter M. – Educational and Psychological Measurement, 2002
Discusses reliability issues in light of recent studies and debates focused on psychometrics versus datametrics terminology and reliabilities generalization. Discusses the way multiple perspectives on score reliability may affect research practice, editorial policies, and reliability generalization across studies. (SLD)
Descriptors: Generalization, Meta Analysis, Psychometrics, Reliability
Peer reviewedRosenthal, J. D. Robert – Developmental Review, 2002
Provides information to help lawyers and expert witnesses understand how well-established legal principles demand the exclusion of suggestion-induced accusation in child abuse cases just as they do suggestion-induced identifications. Discusses legal arguments that support the exclusion of accusations obtained through suggestion and provides an…
Descriptors: Child Abuse, Children, Evidence (Legal), Legal Problems
Peer reviewedYoungstrom, Eric A.; Green, Kristen W. – Educational and Psychological Measurement, 2003
Used reliability generalization to synthesize findings from 30 samples of raw data, involving 2,407 participants, about the self-reporting of emotions using the Differential Emotions Scale (C. Izard and others, 1993). Higher socioeconomic status is positively associated with increased internal consistency; gender appears unrelated to reliability…
Descriptors: Adults, Emotional Response, Generalization, Reliability
Peer reviewedProsnick, Kevin P.; Evans, William J.; Farris, Jaelyn R. – Measurement and Evaluation in Counseling and Development, 2003
This research reports the development and psychometric properties of scores from the 10-item Short Index of Self-Directedness (SISD), drawn from the Temperament and Character Inventory (TCI; C. R. Cloninger, 1987/1992a) and the TCI-125 (C. R. Cloninger, 1992b). Factor structure, construct validity, internal consistency, and test-retest reliability…
Descriptors: Factor Structure, Measures (Individuals), Personality, Psychometrics
Peer reviewedBerry, Kenneth J.; Mielke, Paul W., Jr. – Educational and Psychological Measurement, 1997
A FORTRAN subroutine is presented to calculate a generalized measure of agreement between multiple raters and a set of correct responses at any level of measurement and among multiple responses, along with the associated probability value, under the null hypothesis. (Author)
Descriptors: Computer Software, Interrater Reliability, Measurement Techniques, Probability


