Publication Date
| In 2026 | 7 |
| Since 2025 | 690 |
| Since 2022 (last 5 years) | 3191 |
| Since 2017 (last 10 years) | 7432 |
| Since 2007 (last 20 years) | 15070 |
Descriptor
| Test Reliability | 15055 |
| Test Validity | 10290 |
| Reliability | 9763 |
| Foreign Countries | 7150 |
| Test Construction | 4828 |
| Validity | 4192 |
| Measures (Individuals) | 3880 |
| Factor Analysis | 3826 |
| Psychometrics | 3532 |
| Interrater Reliability | 3126 |
| Correlation | 3040 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1329 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 253 |
| Taiwan | 234 |
| Netherlands | 224 |
| Spain | 218 |
| California | 215 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Peer reviewedCooil, Bruce; Rust, Roland T. – Psychometrika, 1995
A proportional reduction in loss (PRL) measure for reliability of categorical data is explored for the situation in which each of "N" judges assigns a subject to one of "K" categories. Calculating a lower bound for reliability under more general conditions than had been proposed is demonstrated. (SLD)
Descriptors: Bayesian Statistics, Classification, Equations (Mathematics), Estimation (Mathematics)
Peer reviewedPetrosino, Anthony J. – Evaluation Review, 1995
Problems and illustrations are presented from a meta-analysis of crime reduction programs. Eight criteria are specified for including studies in a meta-analysis, and some problematic studies are discussed using the criteria. Rules for handling problem studies in order to retain consistency throughout the analysis are discussed. (SLD)
Descriptors: Crime, Crime Prevention, Criteria, Meta Analysis
Peer reviewedSamejima, Fumiko – Applied Psychological Measurement, 1994
The reliability coefficient is predicted from the test information function (TIF) or two modified TIF formulas and a specific trait distribution. Examples illustrate the variability of the reliability coefficient across different trait distributions, and results are compared with empirical reliability coefficients. (SLD)
Descriptors: Adaptive Testing, Error of Measurement, Estimation (Mathematics), Reliability
Peer reviewedWynkoop, Timothy F.; And Others – Journal of Child Sexual Abuse, 1995
Examines consistency of results and variables that may confound validity in child sexual abuse cases. Concludes that the ways in which information on the incidence and prevalence of child sexual abuse has been collected, analyzed, and reported have been flawed to the extent that few reliable conclusions can be made. Includes suggestions for…
Descriptors: Child Abuse, Data Analysis, Data Collection, Higher Education
Peer reviewedYopp, Hallie Kay – Reading Teacher, 1995
Describes the Yopp-Singer Test of Phoneme Segmentation, which provides teachers with a tool for assessing children's phonemic awareness and identifying children who may experience difficulty in reading and spelling. Offers evidence of its reliability and validity, and discusses its use. (SR)
Descriptors: Language Acquisition, Phonemes, Primary Education, Test Reliability
Peer reviewedHill, Clara E.; And Others – Journal of Counseling Psychology, 1992
Revised Client Verbal Response Category System by creating client behavior system (CBS), which includes eight nominal, mutually exclusive categories. When CBS was used to rate predominant client behavior in middle sessions, adequate interjudge agreement was found, with cognitive-behavioral exploration occurring most frequently. Client experiencing…
Descriptors: Client Characteristics (Human Services), Counseling, Counseling Theories, Interrater Reliability
Peer reviewedLoadman, William E.; And Others – Mid-Western Educational Researcher, 1991
Three methods for grouping items in an opinion survey were compared for their utility in subscale construction: rational organization according to content, factor analysis, and multidimensional scaling. Only subscales based on factor analysis could be refined to meet the criteria of reliability, additivity, and interpretability simultaneously. (SV)
Descriptors: Attitude Measures, Comparative Analysis, Factor Analysis, Item Analysis
Peer reviewedRoss, Donald C. – Educational and Psychological Measurement, 1992
Large sample chi-square tests of the significance of the difference between two correlated kappas, weighted or unweighted, are derived. Cases are presented with one judge in common between the two kappas and no judge in common. An illustrative calculation is included. (Author/SLD)
Descriptors: Chi Square, Correlation, Equations (Mathematics), Evaluators
Peer reviewedAlliger, George M.; Williams, Kevin J. – Educational and Psychological Measurement, 1992
The internal consistency of a scale and various indices of rating scale response styles (such as halo, leniency, and positive or negative response bias) are related to mean scale item intercorrelation. The consequent relationship between internal consistency and rating scale response styles is discussed. (Author/SLD)
Descriptors: Correlation, Evaluators, Interrater Reliability, Rating Scales
Peer reviewedCordes, Anne K.; And Others – Journal of Speech and Hearing Research, 1992
Three groups of judges (n=18) differing in stuttering judgment experience identified stuttering events in repeated speech samples, to investigate a measurement methodology based on time-interval analyses. Results showed interjudge agreement was affected by the particular speech sample, the judges' previous experience, and the length of the…
Descriptors: Evaluation Methods, Experience, Interrater Reliability, Measurement Techniques
Lee, Steven; And Others – Diagnostique, 1991
Thirty-two preschool children were administered the Cognitive Levels Test (CLT) to evaluate its temporal stability and concurrent validity. Results indicated good temporal stability for the CLT-Cognitive Index and high correlations between the CLT-Cognitive Index and the Stanford-Binet: Fourth Edition. (Author/JDD)
Descriptors: Cognitive Ability, Cognitive Tests, Intelligence Tests, Preschool Education
Peer reviewedStanley, Scott M.; Markham, Howard J. – Journal of Marriage and the Family, 1992
Conducted two studies to examine relationship commitment, considered as two constructs (personal dedication and constraint commitment). Developed Commitment Inventory and administered initial inventory to 141 subjects (Study 1) and revised inventory to 279 subjects (Study 2). Demonstrated that Commitment Inventory shows promise as reliable and…
Descriptors: College Students, Higher Education, Interpersonal Relationship, Test Construction
Peer reviewedJorgensen, Jerry D.; Petelle, John L. – Management Communication Quarterly, 1992
Presents an overview of the CLUES (also known as the CL7) instrument. Discusses the instrument's reliability and validity and its application to organizational communication research. Suggests that the instrument demonstrates unidimensionality in low-context cultures, high reliability, and known validity in a wide array of relational types. (RS)
Descriptors: Communication Research, Higher Education, Interpersonal Communication, Organizational Communication
Zemke, Ron – Training, 1992
As the Myers-Briggs Type Indicator becomes increasingly popular as a management and training tool, criticisms of its reliability, validity, and effectiveness have arisen. Some critics see it as a way of giving people type-based excuses for substandard performance. Proponents claim it can match people with a congruent job better than any other…
Descriptors: Measures (Individuals), Personality Traits, Personnel Management, Psychological Studies
Peer reviewedKenny, David A. – Psychological Review, 1991
Consensus refers to the extent of two judges' agreement in rating a common target. A general model of interpersonal perception based on the weighted average model of N. H. Anderson (1981) is developed to show that increased acquaintance does not always lead to changes in consensus. (SLD)
Descriptors: Interpersonal Relationship, Interrater Reliability, Judges, Models


