Publication Date
| In 2026 | 6 |
| Since 2025 | 481 |
| Since 2022 (last 5 years) | 1960 |
| Since 2017 (last 10 years) | 4532 |
| Since 2007 (last 20 years) | 7017 |
Descriptor
| Test Reliability | 15055 |
| Test Validity | 10022 |
| Test Construction | 4374 |
| Foreign Countries | 3840 |
| Psychometrics | 2435 |
| Factor Analysis | 2302 |
| Measures (Individuals) | 1787 |
| Evaluation Methods | 1410 |
| Higher Education | 1391 |
| Questionnaires | 1264 |
| Factor Structure | 1249 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 454 |
| Practitioners | 319 |
| Teachers | 128 |
| Administrators | 73 |
| Policymakers | 33 |
| Counselors | 31 |
| Students | 17 |
| Parents | 10 |
| Community | 6 |
| Support Staff | 5 |
Location
| Turkey | 840 |
| Australia | 239 |
| China | 211 |
| Canada | 207 |
| Indonesia | 163 |
| Spain | 131 |
| United States | 123 |
| United Kingdom | 121 |
| Germany | 112 |
| Taiwan | 108 |
| Netherlands | 103 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 2 |
| Meets WWC Standards with or without Reservations | 2 |
| Does not meet standards | 1 |
Peer reviewedBurton, Richard F. – Assessment & Evaluation in Higher Education, 2001
Item-discrimination indices are numbers calculated from test data that are used in assessing the effectiveness of individual test questions. This article asserts that the indices are so unreliable as to suggest that countless good questions may have been discarded over the years. It considers how the indices, and hence overall test reliability,…
Descriptors: Guessing (Tests), Item Analysis, Test Reliability, Testing Problems
Berge, Jos M. F. Ten; Socan, Gregor – Psychometrika, 2004
To assess the reliability of congeneric tests, specifically designed reliability measures have been proposed. This paper emphasizes that such measures rely on a unidimensionality hypothesis, which can neither be confirmed nor rejected when there are only three test parts, and will invariably be rejected when there are more than three test parts.…
Descriptors: Test Reliability, Sampling, Psychometrics, Test Bias
Burns, Matthew K.; VanDerHeyden, Amanda M.; Jiban, Cynthia L. – School Psychology Review, 2006
This study compared the mathematics performance of 434 second-, third-, fourth-, and fifth-grade students to previously reported fluency and accuracy criteria using three categories of performance (frustration, instructional, and mastery). Psychometric properties of the fluency and accuracy criteria were explored and new criteria for the…
Descriptors: Reading Improvement, Criteria, Psychometrics, Grade 5
Krishnamoorthy, K.; Xia, Yanping – Multivariate Behavioral Research, 2006
The conventional approach for testing the equality of two normal mean vectors is to test first the equality of covariance matrices, and if the equality assumption is tenable, then use the two-sample Hotelling T[superscript 2] test. Otherwise one can use one of the approximate tests for the multivariate Behrens-Fisher problem. In this article, we…
Descriptors: Statistical Analysis, Test Reliability, Test Selection, Error Patterns
van der Meer, Jacques; Scott, Carole – Australasian Journal of Peer Learning, 2009
Much research has been done on the effectiveness of Supplemental Instruction programs, (Peer Assisted Study Sessions, PASS, in Australasia). Less research has emerged on on students' reasons for participating in PASS and their perceptions of the effectiveness of the program. In this article, we will report on a small improvement-focused research…
Descriptors: Foreign Countries, Peer Teaching, Student Attitudes, Student Motivation
Newton, Paul E. – Educational Research, 2009
Background: National curriculum tests have been administered in England for well over a decade. Although reliability evidence has been published, critics have argued that there is not enough evidence (of the right kind) and that test results may be insufficiently reliable. Purpose: This article collates and discusses evidence on the reliability of…
Descriptors: National Curriculum, Test Results, Foreign Countries, Elementary Secondary Education
Wang, Shun-Mei – Applied Environmental Education and Communication, 2009
The purpose of this research is to develop a performance evaluation instrument for green schools in Taiwan. The instrument is designed according to three sets of criteria: participation and partnership, reflection and learning, and ecological consideration. It also covers three operational dimensions: learning context, administration, and…
Descriptors: Environmental Education, Evaluation Criteria, Foreign Countries, Sustainable Development
Screening for Pragmatic Language Impairment: The Potential of the Children's Communication Checklist
Ketelaars, Mieke P.; Cuperus, Juliane M.; van Daal, John; Jansonius, Kino; Verhoeven, Ludo – Research in Developmental Disabilities: A Multidisciplinary Journal, 2009
The present study examines the validity of the Dutch Children's Communication Checklist (CCC) for children in kindergarten in a community sample, in order to assess the feasibility of using it as a screening instrument in the general population. Teachers completed the CCC for a representative sample of 1396 children at kindergarten level, taken…
Descriptors: Check Lists, Emotional Problems, Language Impairments, Construct Validity
Brown, William L.; And Others – 1996
This study presents psychometric characteristics of the mathematics problem solving performance assessment used in the Minneapolis Public Schools, focusing on the interrater reliability, scoring reliability, and validity of the assessment. The Minneapolis Math Problem Solving Assessment (MPSA) was established in 1991. Students are asked to solve…
Descriptors: Elementary School Students, Grade 5, Intermediate Grades, Interrater Reliability
Aycock, Tim – 1993
To determine trends in reporting test reliability, 88 articles addressing 188 instruments in 1980, 81 articles covering 205 instruments in 1985, and 67 articles assessing 195 instruments in 1990 in the "Journal of Counseling Psychology" were reviewed. Articles were examined for the way in which reliability was discussed and reported, and…
Descriptors: Educational Practices, Educational Research, Estimation (Mathematics), Interrater Reliability
McNamara, T. F.; Adams, R. J. – 1991
A preliminary study is reported of the use of new multifaceted Rasch measurement mechanisms for investigating rater characteristics in language testing. Ratings from four judges of scripts from 50 candidates taking the International English Language Testing System test, a test of English for Academic Purposes, are analyzed. The analysis…
Descriptors: Comparative Analysis, English (Second Language), Foreign Countries, Interrater Reliability
Kane, Michael T.; Brennan, Robert L. – 1977
A large number of seemingly diverse coefficients have been proposed as indices of dependability, or reliability, for domain-referenced and/or mastery tests. In this paper, it is shown that most of these indices are special cases of two generalized indices of agreement: one that is corrected for chance, and one that is not. The special cases of…
Descriptors: Bayesian Statistics, Correlation, Criterion Referenced Tests, Cutting Scores
Peer reviewedEpstein, Michael H.; Nieminen, Gayla S. – School Psychology Review, 1983
Teachers and classroom aides of learning disabled students completed the Conners Abbreviated Teacher Rating Scale (CATRS) on two separate occasions. The study investigated the inter-rater and intra-rater reliability of this instrument. CATRS appeared to have sufficient reliability to recommend its continued frequent use. (Author/DWH)
Descriptors: Behavior Rating Scales, Elementary Education, Elementary School Students, Hyperactivity
Peer reviewedPugh, Malcolm; Lock, Roger – Research in Science and Technological Education, 1989
The development of a framework for analyzing pupil talk is described and the reliability of scoring transcribed conversions using the framework discussed. Definitions and examples of the terms used in the framework are appended. (Author/YP)
Descriptors: Biology, Foreign Countries, Group Discussion, Interrater Reliability
Peer reviewedSigafoos, Jeff; Pennell, Donna – Education and Training in Mental Retardation and Developmental Disabilities, 1995
Comparison using paired t-tests of parent and teacher ratings for 16 preschool children on the Receptive-Expressive Emergent Language Scale found no significant differences between parent and teacher ratings of expressive language, but a significant difference on the receptive language subscale. However, interrater reliability was relatively low…
Descriptors: Developmental Disabilities, Expressive Language, Interrater Reliability, Language Skills

Direct link
