Publication Date
| In 2026 | 7 |
| Since 2025 | 690 |
| Since 2022 (last 5 years) | 3191 |
| Since 2017 (last 10 years) | 7432 |
| Since 2007 (last 20 years) | 15070 |
Descriptor
| Test Reliability | 15055 |
| Test Validity | 10290 |
| Reliability | 9763 |
| Foreign Countries | 7150 |
| Test Construction | 4828 |
| Validity | 4192 |
| Measures (Individuals) | 3880 |
| Factor Analysis | 3826 |
| Psychometrics | 3532 |
| Interrater Reliability | 3126 |
| Correlation | 3040 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1329 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 253 |
| Taiwan | 234 |
| Netherlands | 224 |
| Spain | 218 |
| California | 215 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Peer reviewedHofmann, Richard J. – Educational and Psychological Measurement, 1978
The Goodenough technique for determining scale error is compared to the Guttman technique and demonstrated to be more conservative than the Guttman technique. Implications with regard to Guttman's evaluative rule of thumb for evaluating a reproducibility are noted. (Author)
Descriptors: Comparative Analysis, Rating Scales, Statistical Analysis, Test Reliability
Peer reviewedColonius, Hans – Psychometrika, 1977
Parameter estimation for Keats generalization of the Rasch model that takes account of guessing behavior is investigated. It is shown that no minimal sufficient statistics for the ability parameters independent of the difficulty parameters exist. (Author/JKS)
Descriptors: Guessing (Tests), Item Analysis, Test Construction, Test Reliability
Peer reviewedCallender, John C.; Osburn, H. G. – Educational and Psychological Measurement, 1977
A FORTRAN program for maximizing and cross-validating split-half reliability coefficients is described. Externally computed arrays of item means and covariances are used as input for each of two samples. The user may select a number of subsets from the complete set of items for analysis in a single run. (Author/JKS)
Descriptors: Computer Programs, Item Analysis, Test Reliability, Test Validity
Peer reviewedKagan, Norman; Schneider, John – Journal of Counseling & Development, 1987
Describes some of the theoretical bases for the Affective Sensitivity Scale and reports research data on revisions that have been added since 1970. Proposes theoretical constructs to explain the role of affective sensitivity in the process of empathy. (Author/ABB)
Descriptors: Affective Measures, Empathy, Test Reliability, Test Validity
Peer reviewedCliff, Norman – Journal of Educational Statistics, 1984
The proposed coefficient is derived by assuming that the average Goodman-Kruskal gamma between items of identical difficulty would be the same for items of different difficulty. An estimate of covariance between items of identical difficulty leads to an estimate of the correlation between two tests with identical distributions of difficulty.…
Descriptors: Difficulty Level, Mathematical Formulas, Test Items, Test Reliability
Peer reviewedHosie, Peter – Australian Journal of Education, 1986
Interviews can provide valuable information for social researchers, but problems that may affect the quality of the information gathered should be addressed. These include subject-researcher reactivity, role relations, truth telling, reporting of the information collected, and researcher characteristics. A profile of effective interviewer…
Descriptors: Interrater Reliability, Interviews, Questioning Techniques, Research Methodology
Peer reviewedGray, Jeffrey W.; And Others – Psychology in the Schools, 1987
Examined test retest stability of the Maternal Perinatal Scale in 41 mothers. Item stability found over a two-day period and intercorrelations between specific information assessed by items support the clinical and research potential of a systematic self-report format in the assessment of perinatal histories. (Author/NB)
Descriptors: Mothers, Perinatal Influences, Self Evaluation (Individuals), Test Reliability
Zuravin, Susan J.; And Others – Child Abuse and Neglect: The International Journal, 1987
Anonymous reports (n=155) of child physical abuse in Baltimore (MD) were compared with reports made by professionals (n=588) and nonprofessionals (n=262) in terms of substantiation rate, seriousness of substantiated incidents, and severity of allegations. While anonymous reports were more likely to be unfounded, those that were substantiated were…
Descriptors: Child Abuse, Comparative Analysis, Professional Personnel, Reliability
Peer reviewedHughes, Garry L.; Prien, Erich P. – Personnel Psychology, 1986
Investigated psychometric properties of three methods of scoring a Mixed Standard Scale performance evaluation: a patterned procedure, simple nonpatterned scoring procedure and procedure assigning differential weights to statements on the basis of scale values provided by subject matter experts. Found no differences in the score distribution…
Descriptors: Evaluation Methods, Interrater Reliability, Scoring, Scoring Formulas
Peer reviewedMiller, Ivan W.; And Others – Journal of Marital and Family Therapy, 1985
Reports series of studies investigating reliability and validity of the McMaster Family Assessment Device (FAD). Results indicated that the FAD has: (1) adequate test-retest reliability, (2) low correlations with social desirability, (3) moderate correlations with other self-report measures of family functioning, and (4) differentiates…
Descriptors: Family Life, Family Problems, Test Reliability, Test Validity
Peer reviewedWeeks, David J. – Journal of Clinical Psychology, 1986
Presents a brief clinical test, derived from earlier neuropsychological instruments, with evidence for its reliability, interscorer agreement, and validity. The latter is based upon correlations with both CAT scan measures of cortical atrophy and ventricular enlargement, as well as correlations with seven other previously validated cognitive…
Descriptors: Cognitive Tests, Neurological Impairments, Test Reliability, Test Validity
Peer reviewedSackett, Paul R.; Harris, Michael M. – Personnel Psychology, 1984
Describes paper and pencil predictions of employee theft and examines studies of validity, reliability, and adverse impact of these tests. Results showed consistently positive correlations, but identified a variety of methodological differences which make the direct comparison of test validities suspect. (LLL)
Descriptors: Employees, Honesty, Predictor Variables, Test Reliability
Peer reviewedBaran, Jane A.; Gengel, Roy W. – Language, Speech, and Hearing Services in Schools, 1984
The study examined test-retest reliability of three subtests of the Goldman-Fristoe-Woodcock Auditory Skills Test Battery (Diagnostic Auditory Discrimination Test, Auditory Selective Attention Test, and Auditory Memory Tests) with 20 five-12 year olds. The only test-retest significant differences noted were on the Selective Attention subtest.…
Descriptors: Elementary Education, Language Handicaps, Language Tests, Test Reliability
Peer reviewedBrulle, Andrew R.; Hoernicke, Placido Arturo – Mental Retardation and Learning Disability Bulletin, 1984
A reliability study on the public school version of the American Association on Mental Deficiency Adaptive Behavior Scale (ABS) calculated three different reliability estimates: exact agreeement, kappa, and weighted average. Results demonstrated that while the ABS may be useful in making placement decisions, lack of exact reliability raises…
Descriptors: Adaptive Behavior (of Disabled), Mental Retardation, Test Reliability
Peer reviewedZook, Avery, II; Sipps, Gary J. – Journal of Clinical Psychology, 1985
Presents a cross-validation of Reynolds' short form of the Marlowe-Crowne Social Desirability Scale (N=233). Researchers administered 13 items as a separate entity, calculated Cronbach's Alpha for each sex, and computed test-retest correlation for one group. Concluded that the short form is a viable alternative. (Author/NRB)
Descriptors: College Students, Sex Differences, Test Reliability, Test Validity


