Publication Date
| In 2026 | 3 |
| Since 2025 | 666 |
| Since 2022 (last 5 years) | 3167 |
| Since 2017 (last 10 years) | 7408 |
| Since 2007 (last 20 years) | 15046 |
Descriptor
| Test Reliability | 15036 |
| Test Validity | 10272 |
| Reliability | 9759 |
| Foreign Countries | 7141 |
| Test Construction | 4823 |
| Validity | 4191 |
| Measures (Individuals) | 3877 |
| Factor Analysis | 3825 |
| Psychometrics | 3525 |
| Interrater Reliability | 3124 |
| Correlation | 3039 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1327 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 252 |
| Taiwan | 234 |
| Netherlands | 223 |
| Spain | 216 |
| California | 214 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Peer reviewedHansford, B. C.; Hattie, J. A. – Review of Educational Research, 1982
A meta-analysis of 128 studies of the relationship between self and achievement/performance measures reported correlations in the range of -.77 to .96 with an "average" correlation of .21. This average relationship was modified by several variables, including, among others, grade level of subjects, socioeconomic status, ethnicity, and…
Descriptors: Achievement Tests, Correlation, Educational Attainment, Elementary Secondary Education
Peer reviewedVrancic, Daniela; Nanclares, Valeria; Soares, Delfina; Kulesz, Analia; Mordzinski, Claudia; Plebst, Christian; Starkstein, Sergio – Journal of Autism and Developmental Disorders, 2002
A study involving 30 Argentineans with autism evaluated the validity of the Autism Diagnostic Inventory-Telephone Screening in Spanish (ADI-TSS). The final version of the ADI-TSS could be assessed in 20 to 40 minutes and demonstrated a high validity, high interrater reliability, and high internal consistency. (Contains references.) (Author/CR)
Descriptors: Adults, Autism, Disability Identification, Foreign Countries
Peer reviewedAnderson, Stephen A. – Michigan Reading Journal, 2002
Considers the development of an inter-rater reliability correlation comparing the judgments, or scores, or each judge to see if their observations are similar. Presents a case study of the Northville Public Schools' data for the 2000 MEAP (Michigan Educational Assessment Program) Writing Test. Concludes that in this case study the state fails both…
Descriptors: Case Studies, Elementary Education, Evaluation Research, Interrater Reliability
Peer reviewedRyser, Gail R. – Journal of Secondary Gifted Education, 1994
The meanings of reliability and validity as they apply to standardized measures are used as a framework for applying the concepts of reliability and validity to authentic assessments. This article sees reliability as scorability and stability, whereas validity is seen as students' ability to use knowledge authentically in the field. (DB)
Descriptors: Elementary Secondary Education, Evaluation Methods, Performance Based Assessment, Reliability
Peer reviewedLewis, Kerry E. – American Journal of Speech-Language Pathology, 1995
An examination of the extent to which scores on the Stuttering Severity Instrument (SSI) for Children and Adults, Third Edition, accurately reflect 10 judges' observations of stuttering behaviors found that SSI scores obscured the wide range of judges' raw counts and did not accurately reflect the observational data from which they were derived.…
Descriptors: Adults, Children, Evaluation Methods, Interrater Reliability
Peer reviewedSimpson, Robert G. – Behavioral Disorders, 1991
The behavior of each of 120 students in grades 9-12 was rated by 2 of the student's teachers using the Revised Behavior Problem Checklist. Results indicated a generally low to moderate degree of relationship among teacher ratings. It is recommended that clinicians collect behavioral ratings from many raters before reaching diagnostic conclusions.…
Descriptors: Behavior Problems, Check Lists, Clinical Diagnosis, Interrater Reliability
Peer reviewedMcWilliam, R. A.; Ware, William B. – Journal of Early Intervention, 1994
Forty-seven young children, 15 with disabilities, were observed 4 times for types and levels of engagement. Results indicated that engagement is difficult to measure through molecular data collection techniques because of error in dependability measures. The number of observed sessions could be increased to achieve generalizability, but increases…
Descriptors: Attention, Classroom Observation Techniques, Data Collection, Disabilities
Mabry, Linda – Phi Delta Kappan, 1999
Education remains heavily shackled by punitive, test-driven reform. Despite reasonable alternatives, testing increasingly drives educational accountability and reform. Standardization of direct writing assessments promotes scoring reliability and facilitates educational comparisons and rankings. However, standardized writing is not good writing,…
Descriptors: Elementary Secondary Education, Interrater Reliability, Performance Based Assessment, Scoring Rubrics
Peer reviewedNordin, Viviann; Gillberg, Christopher; Nyden, Agneta – Journal of Autism and Developmental Disorders, 1998
This study assessed the interrater reliability of a Swedish version of the Childhood Autism Rating Scale (CARS), an instrument for screening and diagnosis of autism. The CARS was used for rating autistic behavior by two investigators in 25 children. Results indicated fair to excellent agreement. Aspects of validity and reliability are discussed.…
Descriptors: Autism, Behavior Rating Scales, Clinical Diagnosis, Disability Identification
Matson, Johnny L.; Laud, Rinita B.; Gonzalez, Melissa L.; Malone, Carrie J.; Swender, Stephen L. – Research in Developmental Disabilities: A Multidisciplinary Journal, 2005
The use of anti-epileptic medications (AEDs) is much higher in individuals with intellectual disabilities than in the general population. As many of these individuals rely on such medications, clinicians should consider psychometrically sound instruments for assessing adverse side effects of these medications as one aspect of routine clinical…
Descriptors: Evaluation Methods, Seizures, Epilepsy, Developmental Disabilities
Murphy, Elizabeth; Ciszewska-Carr, Justyna – International Review of Research in Open and Distance Learning, 2005
This paper reports on a case study which identifies and illustrates sources of difference in agreement in relation to reliability in a context of quantitative content analysis of a transcript of an online asynchronous discussion (OAD). Transcripts of 10 students in a month-long online asynchronous discussion were coded by two coders using an…
Descriptors: Computer Mediated Communication, Content Analysis, Reliability, Case Studies
Assessing the Evidence: Different Types of NVQ Evidence and Their Impact on Reliability and Fairness
Greatorex, Jackie – Journal of Vocational Education and Training, 2005
The research literature reveals that there are many factors that influence the consistency of assessors' or examiners' judgements. One issue that has not been considered is whether National Vocational Qualifications assessors' consistency of judgement is affected by different types of evidence. In this article, 15 Customer Service and 12 Assessor…
Descriptors: Qualifications, Examiners, Interrater Reliability, Job Applicants
Olswang, Lesley B.; Svensson, Liselotte; Coggins, Truman E.; Beilinson, Jill S.; Donaldson, Amy L. – Journal of Speech, Language, and Hearing Research, 2006
Purpose: To explore the utility of time-interval analysis for documenting the reliability of coding social communication performance of children in classroom settings. Of particular interest was finding a method for determining whether independent observers could reliably judge both occurrence and duration of ongoing behavioral dimensions for…
Descriptors: Reliability, Coding, Intervals, Kindergarten
Setzer, J. Carl – GED Testing Service, 2009
The GED[R] English as a Second Language (GED ESL) Test was designed to serve as an adjunct to the GED test battery when an examinee takes either the Spanish- or French-language version of the tests. The GED ESL Test is a criterion-referenced, multiple-choice instrument that assesses the functional, English reading skills of adults whose first…
Descriptors: Language Tests, High School Equivalency Programs, Psychometrics, Reading Skills
Ezzelle, Carol; Setzer, J. Carl – GED Testing Service, 2009
This manual was written to provide technical information regarding the 2002 Series GED (General Educational Development) Tests. Throughout this manual, documentation is provided regarding the development of the GED Tests, data collection activities, as well as reliability and validity evidence. The purpose of this manual is to provide evidence…
Descriptors: High School Equivalency Programs, Testing Programs, Test Validity, Test Reliability

Direct link
