NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
What Works Clearinghouse Rating
Showing all 10 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Hogan, Thomas; DeStefano, Marissa; Gilby, Caitlin; Kosman, Dana; Peri, Joshua – Applied Measurement in Education, 2021
Buros' "Mental Measurements Yearbook (MMY)" has provided professional reviews of commercially published psychological and educational tests for over 80 years. It serves as a kind of conscience for the testing industry. For a random sample of 50 entries in the "19th MMY" (a total of 100 separate reviews) this study determined…
Descriptors: Test Reviews, Interrater Reliability, Psychological Testing, Educational Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024
We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…
Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Beck, Klaus – Frontline Learning Research, 2020
Many test developers try to ensure the content validity of their tests by having external experts review the items, e.g. in terms of relevance, difficulty, or clarity. Although this approach is widely accepted, a closer look reveals several pitfalls need to be avoided if experts' advice is to be truly helpful. The purpose of this paper is to…
Descriptors: Content Validity, Psychological Testing, Educational Testing, Student Evaluation
Hixson, Nate; Rhudy, Vaughn – West Virginia Department of Education, 2013
Student responses to the West Virginia Educational Standards Test (WESTEST) 2 Online Writing Assessment are scored by a computer-scoring engine. The scoring method is not widely understood among educators, and there exists a misperception that it is not comparable to hand scoring. To address these issues, the West Virginia Department of Education…
Descriptors: Scoring Formulas, Scoring Rubrics, Interrater Reliability, Test Scoring Machines
Tennessee Department of Education, 2012
In the summer of 2011, the Tennessee Department of Education contracted with the National Institute for Excellence in Teaching (NIET) to provide a four-day training for all evaluators across the state. NIET trained more than 5,000 evaluators intensively in the state model (districts using alternative instruments delivered their own training).…
Descriptors: Video Technology, Feedback (Response), Evaluators, Interrater Reliability
Peer reviewed Peer reviewed
Longford, N. T. – Journal of Educational and Behavioral Statistics, 1994
Presents a model-based approach to rater reliability for essays read by multiple raters. The approach is motivated by generalizability theory, and variation of rater severity and rater inconsistency is considered in the presence of between-examinee variations. Illustrates methods with data from standardized educational tests. (Author/SLD)
Descriptors: Educational Testing, Essay Tests, Generalizability Theory, Interrater Reliability
Peer reviewed Peer reviewed
Huot, Brian – College Composition and Communication, 1990
Describes holistic scoring as one of the biggest breakthroughs in writing assessment. Suggests that the technique's high interrater reliability coefficients partly explain holistic scoring's popularity. Argues that validity has been largely neglected. Concludes that more must be learned about the uses and effects of holistic scoring. (SG)
Descriptors: Educational Testing, Higher Education, Holistic Approach, Holistic Evaluation
Peer reviewed Peer reviewed
Mills, Janet – Bulletin of the Council for Research in Music Education, 1987
Questions the extent to which assessment of solo musical performance can be made under the General Certificate of School Education exam in England and Wales. Discusses performances as criterion. Reports on experiment which attempted to assess a student's overall music performance. Offers a model which can be used to better measure solo music…
Descriptors: Educational Research, Educational Testing, Foreign Countries, Interrater Reliability
Peer reviewed Peer reviewed
Harrison, Patti L. – Journal of Special Education, 1987
Part of a special issue on adaptive behavior, the article reviews adaptive behavior research in areas which include the relationship between adaptive behavior and intelligence and school achievement, relationship between different measures of adaptive behavior, predictive aspects, declassification, group differences in adaptive behavior,…
Descriptors: Academic Achievement, Adaptive Behavior (of Disabled), Behavior Rating Scales, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Curren, Randall R. – Theory and Research in Education, 2004
This article addresses the capacity of high stakes tests to measure the most significant kinds of learning. It begins by examining a set of philosophical arguments pertaining to construct validity and alleged conceptual obstacles to attributing specific knowledge and skills to learners. The arguments invoke philosophical doctrines of holism and…
Descriptors: Test Items, Educational Testing, Construct Validity, High Stakes Tests