Publication Date
| In 2026 | 7 |
| Since 2025 | 690 |
| Since 2022 (last 5 years) | 3191 |
| Since 2017 (last 10 years) | 7432 |
| Since 2007 (last 20 years) | 15070 |
Descriptor
| Test Reliability | 15055 |
| Test Validity | 10290 |
| Reliability | 9763 |
| Foreign Countries | 7150 |
| Test Construction | 4828 |
| Validity | 4192 |
| Measures (Individuals) | 3880 |
| Factor Analysis | 3826 |
| Psychometrics | 3532 |
| Interrater Reliability | 3126 |
| Correlation | 3040 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1329 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 253 |
| Taiwan | 234 |
| Netherlands | 224 |
| Spain | 218 |
| California | 215 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Muller, Jorg M. – Educational and Psychological Measurement, 2006
A new test index is defined as the probability of obtaining two randomly selected test scores (PDTS) as statistically different. After giving a concept definition of the test index, two simulation studies are presented. The first analyzes the influence of the distribution of test scores, test reliability, and sample size on PDTS within classical…
Descriptors: Test Reliability, Probability, Scores, Item Response Theory
Wise, Lauress L. – Educational Measurement: Issues and Practice, 2006
Uses and consequences of educational testing have increased dramatically in recent years. Professional standards to ensure fair treatment of all affected by test results are more important than ever, but standards for developing and using educational tests are only helpful if they are followed. Test developers and users each have a role to play in…
Descriptors: Educational Testing, Standards, Accountability, Cooperation
Efklides, Anastasia – Educational Psychology Review, 2006
The measurement of online self-regulation processes is a very important issue and in this rejoinder to Ainley and Patrick (this issue) I am arguing that including measures of metacognitive experiences, in conjunction with measures of other affective experiences, in various phases of task processing can increase the reliability and validity of…
Descriptors: Metacognition, Learning Processes, Reader Response, Self Management
Schuster, Christof – Educational and Psychological Measurement, 2004
This article presents a formula for weighted kappa in terms of rater means, rater variances, and the rater covariance that is particularly helpful in emphasizing that weighted kappa is an absolute agreement measure in the sense that it is sensitive to differences in rater's marginal distributions. Specifically, rater mean differences will decrease…
Descriptors: Computation, Rating Scales, Interrater Reliability, Statistical Analysis
Shields, Alan L.; Caruso, John C. – Educational and Psychological Measurement, 2004
The CAGE is a commonly used alcohol screening instrument. Although considerable work has been done on the validity of CAGE scores, relatively little information is available on their reliability. Reliability induction and generalization studies were performed for the CAGE. Of the 259 studies available for analysis, only 19 (7.3%) contained…
Descriptors: Logical Thinking, Generalization, Test Reliability, Questionnaires
Gierl, Mark J.; Gotzmann, Andrea; Boughton, Keith A. – Applied Measurement in Education, 2004
Differential item functioning (DIF) analyses are used to identify items that operate differently between two groups, after controlling for ability. The Simultaneous Item Bias Test (SIBTEST) is a popular DIF detection method that matches examinees on a true score estimate of ability. However in some testing situations, like test translation and…
Descriptors: True Scores, Simulation, Test Bias, Student Evaluation
Wood, Heather M.; Baumgartner, Ted A. – Measurement in Physical Education and Exercise Science, 2004
The revised push-up test has been found to have good validity but it produces many zero scores for women. Maybe there should be an alternative to the revised push-up test for college-age women. The purpose of this study was to determine the objectivity, reliability, and validity for the bent-knee push-up test (executed on hands and knees) for…
Descriptors: Body Weight, Athletics, Females, Predictive Validity
Einarsdottir, Johanna; Ingham, Roger J. – American Journal of Speech-Language Pathology, 2005
Purpose: This article critically reviews evidence to determine whether the use of disfluency typologies, such as "syllable repetitions" or "prolongations", has assisted the understanding or treatment of developmental stuttering. Consideration is given to whether there is a need for a fundamental shift in the basis for constructing measures of…
Descriptors: Stuttering, Measures (Individuals), Evidence, Test Reliability
MacCann, Robert G. – Psychometrika, 2004
For (0, 1) scored multiple-choice tests, a formula giving test reliability as a function of the number of item options is derived, assuming the "knowledge or random guessing model," the parallelism of the new and old tests (apart from the guessing probability), and the assumptions of classical test theory. It is shown that the formula is a more…
Descriptors: Guessing (Tests), Multiple Choice Tests, Test Reliability, Test Theory
Zeanah, Charles H.; Scheeringa, Michael; Boris, Neil W.; Heller, Sherryl S.; Smyke, Anna T.; Trapani, Jennifer – Child Abuse & Neglect: The International Journal, 2004
Objective: To determine if Reactive Attachment Disorder (RAD) can be reliably identified in maltreated toddlers in foster care, if the two types of RAD are independent, and to estimate the prevalence of RAD in these maltreated toddlers. Methods: Clinicians treating 94 maltreated toddlers in foster care were interviewed regarding signs of…
Descriptors: Attachment Behavior, Behavior Disorders, Toddlers, Child Abuse
Terezinha, Nunes; Ursula, Pretzlik; Selin Ilicak – Journal of Deaf Studies and Deaf Education, 2005
This paper analyzes the reliability and validity of a questionnaire designed by Archbold, Lutman, Gregory, O'Neil, and Nikolpoulos (2002) for the assessment of pediatric cochlear implantation. Parents of 61 youngsters (age range 5 to 16 years), who had the implant for at least 3 years, responded to the questionnaire and to an interview. The alpha…
Descriptors: Questionnaires, Pediatrics, Assistive Technology, Reliability
Krippendorff, Klaus – Human Communication Research, 2004
In a recent article in this journal, Lombard, Snyder-Duch, and Bracken (2002) surveyed 200 content analyses for their reporting of reliability tests, compared the virtues and drawbacks of five popular reliability measures, and proposed guidelines and standards for their use. Their discussion revealed that numerous misconceptions circulate in the…
Descriptors: Misconceptions, Content Analysis, News Reporting, Measurement Techniques
Campbell, Jonathan M. – Journal of Autism and Developmental Disorders, 2005
Five rating scales for screening and detection of Asperger's Disorder, three commercially available and two research instruments, are evaluated with reference to psychometric criteria outlined by Bracken in 1987 ("Journal of Psychoeducational Assessment," 4, 313). Reliability and validity data reported in examiner's manuals or published reports…
Descriptors: Diagnostic Tests, Asperger Syndrome, Rating Scales, Clinical Diagnosis
de Bildt, Annelies; Kraijer, Dirk; Sytema, Sjoerd; Minderaa, Ruud – Journal of Autism and Developmental Disorders, 2005
The psychometric properties of the Vineland Adaptive Behavior Scales Survey Form were studied in a total population of children and adolescents with MR, and in the specific levels of functioning (n=826, age 4-18 years). The original division into (sub)domains, as assigned by the authors, was replicated in the total population and in the mild and…
Descriptors: Psychometrics, Measures (Individuals), Children, Adolescents
Leach, Lesley F.; Henson, Robin K.; Odom, Leslie R.; Cagle, Lynne S. – Educational and Psychological Measurement, 2006
The use of reliability generalization methodology promises to, among other things, inform researchers about the importance of reporting reliability coefficients and their use in result interpretation. This study presents results from a reliability generalization study of the Self-Description Questionnaire (SDQ). The average score reliabilities…
Descriptors: Reliability, Questionnaires, Research Methodology, Scores

Peer reviewed
Direct link
