Publication Date
| In 2026 | 7 |
| Since 2025 | 690 |
| Since 2022 (last 5 years) | 3191 |
| Since 2017 (last 10 years) | 7432 |
| Since 2007 (last 20 years) | 15070 |
Descriptor
| Test Reliability | 15055 |
| Test Validity | 10290 |
| Reliability | 9763 |
| Foreign Countries | 7150 |
| Test Construction | 4828 |
| Validity | 4192 |
| Measures (Individuals) | 3880 |
| Factor Analysis | 3826 |
| Psychometrics | 3532 |
| Interrater Reliability | 3126 |
| Correlation | 3040 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1329 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 253 |
| Taiwan | 234 |
| Netherlands | 224 |
| Spain | 218 |
| California | 215 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Peer reviewedQuereshi, M. Y.; And Others – Journal of Clinical Psychology, 1984
Administered the Wechsler Intelligence Scale for Children, Wechsler Intelligence Scale for Children-Revised, and Wechsler Preschool Primary Scale of Intelligence in a counterbalanced design to randomly selected elementary school children (N=72). Results indicated that the verbal Intelligence Quotients (IQs) were comparable, but the performance and…
Descriptors: Comparative Testing, Elementary Education, Elementary School Students, Intelligence Tests
Peer reviewedWillson, Victor L.; Reynolds, Cecil R. – Educational and Psychological Measurement, 1984
Samples in research on individual and group differences may be selected based on whole scores which differ from the population mean. Children are diagnosed in clinical practice with a whole score. These procedures produce regression to the population mean which can affect accuracy and adequacy of part score interpretations. (Author/DWH)
Descriptors: Correlation, Intelligence Tests, Profiles, Scores
Peer reviewedBrown, Linda; Bryant, Brian R. – Remedial and Special Education (RASE), 1984
The article reviews Consumer's Guide to Tests in Print, noting its purposes (to provide objective information about technical characteristics of standardized tests); criteria for evaluating standardizaton, reliability, and validity; and its rating system based on evaluations of selected review panel members. (CL)
Descriptors: Elementary Secondary Education, Standardized Tests, Test Construction, Test Reliability
Peer reviewedBryson, Susan E.; Pilon, David J. – Journal of Clinical Psychology, 1984
Carried out four experiments in which male and female undergraduates (N=384) completed the Beck Depression Inventory under conditions ranging from absolute anonymity to a face-to-face interview. Results showed no evidence that depression is more severe or common in females. Responses appeared essentially unaffected by method of administration.…
Descriptors: College Students, Depression (Psychology), Foreign Countries, Higher Education
Peer reviewedSchilling, Lee H. – Journal of American College Health, 1984
Information was gathered from college students to observe choice of postcoital contraception (PCC) and the effectiveness of the choice. Results indicate that PCC is an effective second chance to prevent unintended pregnancy. Research methodology is presented. (Author/DF)
Descriptors: Abortions, College Students, Contraception, Females
Peer reviewedHale, Gordon, A.; And Others – Language Learning, 1983
Addresses the issues of whether test scores are affected by the prior availability of the items on a test. Concludes that, while disclosing items significantly affects test scores, the magnitude of the disclosure effect drecreases with an increase in the size of the disclosed pool. (EKN)
Descriptors: English (Second Language), Language Tests, Scores, Second Language Learning
Peer reviewedWilcox, Rand R. – Journal of Educational Statistics, 1983
The problem of determining which of several populations has the largest mean is considered. The procedure described by Dudewicz and Dalal is extended to the case of unequal sample sizes. (JKS)
Descriptors: Analysis of Variance, Nonparametric Statistics, Probability, Reliability
Peer reviewedO'Donnell, Michael P.; Wood, Margo – Journal of Reading, 1984
Concludes that The London Procedure does not reflect contemporary research in the fields of literacy acquisition and learning disabilities. (AEA)
Descriptors: Adult Basic Education, Adult Literacy, Reading Diagnosis, Test Reliability
Peer reviewedDowaliby, Fred J.; And Others – American Annals of the Deaf, 1983
The Locus of Control Inventory for the Deaf (LCID), consisting of a 23-item Likert-like scale, and two commonly used scales to assess locus of control in hearing persons were administered to 174 deaf freshman students. Intercorrelation findings demonstrated greater soundness of the LCID as compared with the other scales. (Author)
Descriptors: Correlation, Deafness, Locus of Control, Measures (Individuals)
Peer reviewedHopkins, Kenneth D. – Journal of Special Education, 1983
This article illustrates the use of generalizability theory in special education to estimate the reliability of a measure when there is more than one source of error in the universe of inference and how the effects from changing the number of items and/or raters can be evaluated. (Author)
Descriptors: Generalization, Item Analysis, Mathematics, Research Methodology
Peer reviewedWood, R.; Quinn, B. – Educational Review, 1976
Impression marking of English Language essay and summary questions by pairs of examiners is shown, as expected, to be more reliable than single marking. Given the limited statistical information available, it is concluded that pairing of examiners can as well be done by random or quasi-random means as by attempts at calculated matching. (Editor/RK)
Descriptors: Bias, Educational Research, Essay Tests, Examiners
Armstrong, David G. – Educational Technology, 1976
A procedure designed to help instructional decision-makers evaluate individual sources of diagnostic information in terms of their functional utility may add a desirable measure of precision to their instructional prescriptions for learners. (Author)
Descriptors: Conceptual Schemes, Decision Making, Diagnostic Teaching, Information Processing
Peer reviewedSchwab, Donald P.; And Others – Personnel Psychology, 1975
Recently, an evaluation procedure, behaviorally anchored rating scales (BARS), has been developed that attempts to capture performance in multidimensional, behavior-specific terms. Article reviews and evaluates the research on BARS and suggests new directions for future research. (Author/RK)
Descriptors: Behavior Rating Scales, Performance Criteria, Psychological Studies, Tables (Data)
Peer reviewedHood, Joyce – Reading Research Quarterly, 1975
Descriptors: Evaluation Methods, Higher Education, Miscue Analysis, Oral Reading
Zhang, Yanwei; Breithaupt, Krista; Tessema, Aster; Chuah, David – Online Submission, 2006
Two IRT-based procedures to estimate test reliability for a certification exam that used both adaptive (via a MST model) and non-adaptive design were considered in this study. Both procedures rely on calibrated item parameters to estimate error variance. In terms of score variance, one procedure (Method 1) uses the empirical ability distribution…
Descriptors: Individual Testing, Test Reliability, Programming, Error of Measurement


