ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	3
Since 2007 (last 20 years)	5

Descriptor

Educational Testing	10
Interrater Reliability	10
Achievement Tests	2
Comparative Analysis	2
Educational Research	2
Evaluation Methods	2
Psychological Testing	2
Scoring Rubrics	2
Test Items	2
Academic Achievement	1
Academic Standards	1
Adaptive Behavior (of…	1
Administrator Attitudes	1
Behavior Rating Scales	1
Classroom Observation…	1
Coaching (Performance)	1
Comparative Testing	1
Conflict Resolution	1
Construct Validity	1
Content Validity	1
Correlation	1
Criterion Referenced Tests	1
Differences	1
Disabilities	1
Educational Change	1
More ▼

Source

Applied Measurement in…	1
Bulletin of the Council for…	1
College Composition and…	1
Educational Assessment	1
Frontline Learning Research	1
Journal of Educational and…	1
Journal of Special Education	1
Tennessee Department of…	1
Theory and Research in…	1
West Virginia Department of…	1

Author

Beck, Klaus	1
Curren, Randall R.	1
DeStefano, Marissa	1
Gilby, Caitlin	1
Harrison, Patti L.	1
Hixson, Nate	1
Hogan, Thomas	1
Huot, Brian	1
Kosman, Dana	1
Longford, N. T.	1
Mills, Janet	1
Peri, Joshua	1
Rhudy, Vaughn	1
Stefanie A. Wind	1
Yangmeng Xu	1
More ▼

Publication Type

Journal Articles	8
Reports - Research	4
Reports - Evaluative	3
Information Analyses	2
Numerical/Quantitative Data	1
Reports - Descriptive	1

Education Level

Elementary Secondary Education

Audience

Location

Tennessee	1
United Kingdom (England)	1
United Kingdom (Wales)	1
West Virginia	1

Laws, Policies, & Programs

No Child Left Behind Act 2001	1
Race to the Top	1

Assessments and Surveys

National Assessment of…	1
Stanford Achievement Tests	1

What Works Clearinghouse Rating

Showing all 10 results Save | Export

Reviewing the Test Reviews: Quality Judgments and Reviewer Agreements in the Mental Measurements Yearbook

Peer reviewed

Direct link

Hogan, Thomas; DeStefano, Marissa; Gilby, Caitlin; Kosman, Dana; Peri, Joshua – Applied Measurement in Education, 2021

Buros' "Mental Measurements Yearbook (MMY)" has provided professional reviews of commercially published psychological and educational tests for over 80 years. It serves as a kind of conscience for the testing industry. For a random sample of 50 entries in the "19th MMY" (a total of 100 separate reviews) this study determined…

Descriptors: Test Reviews, Interrater Reliability, Psychological Testing, Educational Testing

Resolving and Re-Scoring Constructed Response Items in Mixed-Format Assessments: An Exploration of Three Approaches

Peer reviewed

Direct link

Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024

We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…

Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners

Ensuring Content Validity of Psychological and Educational Tests -- The Role of Experts

Peer reviewed
PDF on ERIC

Download full text

Beck, Klaus – Frontline Learning Research, 2020

Many test developers try to ensure the content validity of their tests by having external experts review the items, e.g. in terms of relevance, difficulty, or clarity. Although this approach is widely accepted, a closer look reveals several pitfalls need to be avoided if experts' advice is to be truly helpful. The purpose of this paper is to…

Descriptors: Content Validity, Psychological Testing, Educational Testing, Student Evaluation

Findings from the 2012 West Virginia Online Writing Scoring Comparability Study

Download full text

Hixson, Nate; Rhudy, Vaughn – West Virginia Department of Education, 2013

Student responses to the West Virginia Educational Standards Test (WESTEST) 2 Online Writing Assessment are scored by a computer-scoring engine. The scoring method is not widely understood among educators, and there exists a misperception that it is not comparable to hand scoring. To address these issues, the West Virginia Department of Education…

Descriptors: Scoring Formulas, Scoring Rubrics, Interrater Reliability, Test Scoring Machines

Teacher Evaluation in Tennessee: A Report on Year 1 Implementation

Download full text

Tennessee Department of Education, 2012

In the summer of 2011, the Tennessee Department of Education contracted with the National Institute for Excellence in Teaching (NIET) to provide a four-day training for all evaluators across the state. NIET trained more than 5,000 evaluators intensively in the state model (districts using alternative instruments delivered their own training).…

Descriptors: Video Technology, Feedback (Response), Evaluators, Interrater Reliability

Reliability of Essay Rating and Score Adjustment.

Peer reviewed

Longford, N. T. – Journal of Educational and Behavioral Statistics, 1994

Presents a model-based approach to rater reliability for essays read by multiple raters. The approach is motivated by generalizability theory, and variation of rater severity and rater inconsistency is considered in the presence of between-examinee variations. Illustrates methods with data from standardized educational tests. (Author/SLD)

Descriptors: Educational Testing, Essay Tests, Generalizability Theory, Interrater Reliability

Reliability, Validity, and Holistic Scoring: What We Know and What We Need to Know.

Peer reviewed

Huot, Brian – College Composition and Communication, 1990

Describes holistic scoring as one of the biggest breakthroughs in writing assessment. Suggests that the technique's high interrater reliability coefficients partly explain holistic scoring's popularity. Argues that validity has been largely neglected. Concludes that more must be learned about the uses and effects of holistic scoring. (SG)

Descriptors: Educational Testing, Higher Education, Holistic Approach, Holistic Evaluation

Assessment of Solo Musical Performance: A Preliminary Study.

Peer reviewed

Mills, Janet – Bulletin of the Council for Research in Music Education, 1987

Questions the extent to which assessment of solo musical performance can be made under the General Certificate of School Education exam in England and Wales. Discusses performances as criterion. Reports on experiment which attempted to assess a student's overall music performance. Offers a model which can be used to better measure solo music…

Descriptors: Educational Research, Educational Testing, Foreign Countries, Interrater Reliability

Research with Adaptive Behavior Scales.

Peer reviewed

Harrison, Patti L. – Journal of Special Education, 1987

Part of a special issue on adaptive behavior, the article reviews adaptive behavior research in areas which include the relationship between adaptive behavior and intelligence and school achievement, relationship between different measures of adaptive behavior, predictive aspects, declassification, group differences in adaptive behavior,…

Descriptors: Academic Achievement, Adaptive Behavior (of Disabled), Behavior Rating Scales, Comparative Analysis

Educational Measurement and Knowledge of Other Minds

Peer reviewed

Direct link

Curren, Randall R. – Theory and Research in Education, 2004

This article addresses the capacity of high stakes tests to measure the most significant kinds of learning. It begins by examining a set of philosophical arguments pertaining to construct validity and alleged conceptual obstacles to attributing specific knowledge and skills to learners. The arguments invoke philosophical doctrines of holism and…

Descriptors: Test Items, Educational Testing, Construct Validity, High Stakes Tests