Publication Date
| In 2026 | 3 |
| Since 2025 | 675 |
| Since 2022 (last 5 years) | 3176 |
| Since 2017 (last 10 years) | 7417 |
| Since 2007 (last 20 years) | 15055 |
Descriptor
| Test Reliability | 15043 |
| Test Validity | 10279 |
| Reliability | 9761 |
| Foreign Countries | 7144 |
| Test Construction | 4825 |
| Validity | 4191 |
| Measures (Individuals) | 3877 |
| Factor Analysis | 3825 |
| Psychometrics | 3526 |
| Interrater Reliability | 3124 |
| Correlation | 3040 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1328 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 253 |
| Taiwan | 234 |
| Netherlands | 223 |
| Spain | 217 |
| California | 215 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Peer reviewedLewandowski, Lawrence J.; Martens, Brian K. – Journal of Reading, 1990
Provides an approach for selecting and evaluating both group and individually administered standardized tests of reading. Reviews considerations of the quality of test development; test content; test reliability and validity; and concerns of cost and time investment. Presents sample ratings of two common instruments. (RS)
Descriptors: Reading Tests, Secondary Education, Standardized Tests, Test Reliability
Peer reviewedO'Carroll, Patrick W. – Suicide and Life-Threatening Behavior, 1989
Briefly outlines problems associated with definition and official certification of suicide and reviews literature pertaining to validity and reliability of suicide statistics. Considers process of suicide certification as a test, estimating its sensitivity, specificity, and predictive value, using data from studies reviewed. (NB)
Descriptors: Attrition (Research Studies), Death, Evaluation Problems, Reliability
Peer reviewedUmesh, U. N.; And Others – Educational and Psychological Measurement, 1989
An approach is provided for calculating maximum values of the Kappa statistic of J. Cohen (1960) as a function of observed agreement proportions between evaluators. Separate calculations are required for different matrix sizes and observed agreement levels. (SLD)
Descriptors: Equations (Mathematics), Evaluators, Heuristics, Interrater Reliability
Peer reviewedEllers, Robert A.; And Others – Journal of School Psychology, 1989
Examined test-retest stability of Behavior Rating Profile for students grades l-12 (N=198), parents (N=212), and teachers (N=176) on 3 norm-referenced scales. Found Teacher Rating scale reliable across all grades for screening and eligibility, Parent Rating scale reliable for Grade 3-12 screening and Grade 3-6,ll, and l2, eligibility. Found…
Descriptors: Behavior Rating Scales, Elementary Secondary Education, Special Education, Test Reliability
Peer reviewedHumphreys, Lloyd G.; Drasgow, Fritz – Applied Psychological Measurement, 1989
Issues arising from difference scores with zero reliability that nevertheless allow a powerful test of change are discussed. Issues include the appropriateness of underlying statistical models for psychological data and the relationship between difference scores and power. Increases in reliability always increase power for a fixed effect size.…
Descriptors: Goodness of Fit, Mathematical Models, Power (Statistics), Psychometrics
Peer reviewedvan den Wollenberg, Arnold L.; And Others – Applied Psychological Measurement, 1988
The unconditional--simultaneous--maximum likelihood (UML) estimation procedure for the one-parameter logistic model produces biased estimators. The UML method is inconsistent and is not a good alternative to conditional maximum likelihood method, at least with small numbers of items. The minimum Chi-square estimation procedure produces unbiased…
Descriptors: Computer Simulation, Estimation (Mathematics), Maximum Likelihood Statistics, Reliability
Peer reviewedGlutting, Joseph J. – Journal of School Psychology, 1989
Introduces Stanford-Binet Intelligence Scale-Fourth Edition (SB4) as an attempt to revitalize Stanford-Binet by maintaining links with previous editions while simultaneously incorporating more recent developments found in other popular tests of intelligence. Discusses the SB4's theoretical foundation, materials and administration, scaling,…
Descriptors: Intelligence Tests, Models, Test Reliability, Test Use
Peer reviewedWilliams, Richard H.; And Others – Journal of Experimental Education, 1995
The paradox that a Student t-test based on pretest-posttest differences can attain its greatest power when the difference score reliability is zero was explained by demonstrating that power is not a mathematical function of reliability unless either true score variance or error score variance is constant. (SLD)
Descriptors: Error of Measurement, Power (Statistics), Pretests Posttests, Reliability
Peer reviewedCordes, Anne K.; Ingham, Roger J. – Journal of Speech and Hearing Research, 1994
This paper reviews the prominent concepts of the stuttering event and concerns about the reliability of stuttering event measurements, specifically interjudge agreement. Recent attempts to resolve the stuttering measurement problem are reviewed, and the implications of developing an improved measurement system are discussed. (Author/JDD)
Descriptors: Data Collection, Interrater Reliability, Measurement Techniques, Observation
Peer reviewedDiamond, Adele; And Others – Developmental Psychology, 1994
Found that faulty test procedures may explain why infants sometimes locate hidden objects more easily in multiple-well tests than in two-well trials. Also found that errors in seven-well tests were not evenly distributed but occurred disproportionately in the direction of the previously correct well, suggesting that memory and inhibition are both…
Descriptors: Infants, Inhibition, Memory, Recall (Psychology)
Peer reviewedMarcoulides, George A.; Simkin, Mark G. – Journal of Education for Business, 1995
Each paper written by 60 sophomores in computer classes received 3 peer evaluations using a structured evaluation process. Overall, students were able to grade efficiently and consistently in terms of overall score and selected criteria (subject matter, content, and mechanics). (SK)
Descriptors: Higher Education, Interrater Reliability, Peer Evaluation, Undergraduate Students
Peer reviewedThompson, Robert J.; And Others – Journal of Consulting and Clinical Psychology, 1994
Describes investigation utilizing sickle cell disease subjects from a stress and coping project. Found little stability in classification of individuals' adjustment, low congruence in behavior problem patterns and diagnoses, and less stability in adjustment by child report than mother report. Suggests children's coping strategies are intervention…
Descriptors: Children, Classification, Coping, Preadolescents
Peer reviewedDriessen, Marie-Jose; And Others – Occupational Therapy Journal of Research, 1995
Two occupational therapists in an interrater test and 9 in an intrarater test used a form based on the International Classification of Impairments, Disabilities, and Handicaps to evaluate 50 patients in a psychiatric hospital and 50 in a rehabilitation center. Based on percentage of agreement and Cohen's kappa, the reliability of the diagnoses was…
Descriptors: Clinical Diagnosis, Disabilities, Interrater Reliability, Occupational Therapy
Peer reviewedKennamer, J. David – Journalism Quarterly, 1992
Investigates the use of "vague quantifiers" (terms such as "often,""sometimes,""rarely," or "never") in communication research. Finds that these words do not always mean the same thing to different people, and thus may not constitute interval scales. Suggests that research outcomes based upon such…
Descriptors: Communication Research, Higher Education, Research Methodology, Research Problems
Peer reviewedHansen, Jo-Ida C.; And Others – Journal of Vocational Behavior, 1993
Multidimensional scaling was applied to Women-in-General (n=300) and Men-in-General (n=300) samples of the Strong Interest Inventory. Participants were matched on occupational title, obtaining two-dimensional solutions that demonstrated gender differences in the underlying structure of vocational interests. (SK)
Descriptors: Interest Inventories, Multidimensional Scaling, Sex Differences, Test Reliability


