ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	8
Since 2006 (last 20 years)	16

Descriptor

Interrater Reliability	22
Scoring	11
Evaluators	7
Test Interpretation	6
Standard Setting (Scoring)	5
Identification	4
Test Items	4
Cutting Scores	3
Evaluation Methods	3
Mathematics Tests	3
Minimum Competencies	3
Minimum Competency Testing	3
Testing Problems	3
Validity	3
Academic Achievement	2
Alignment (Education)	2
Classification	2
Comparative Analysis	2
Computer Assisted Instruction	2
Decision Making	2
Difficulty Level	2
Educational Assessment	2
Error of Measurement	2
Examiners	2
Factor Analysis	2
More ▼

Source

Educational Measurement:…

Publication Type

Journal Articles	22
Reports - Research	10
Reports - Evaluative	9
Reports - Descriptive	3
Opinion Papers	1
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education	2
Secondary Education	2
Junior High Schools	1
Middle Schools	1

Audience

Location

United Kingdom (England)

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing 1 to 15 of 22 results Save | Export

Examining the Psychometric Impact of Targeted and Random Double-Scoring in Mixed-Format Assessments

Peer reviewed

Direct link

Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025

Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…

Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods

A Rubric for the Detection of Students in Crisis

Peer reviewed

Direct link

Burkhardt, Amy; Lottridge, Susan; Woolf, Sherri – Educational Measurement: Issues and Practice, 2021

For some students, standardized tests serve as a conduit to disclose sensitive issues of harm or distress that may otherwise go unreported. By detecting this writing, known as "crisis papers," testing programs have a unique opportunity to assist in mitigating the risk of harm to these students. The use of machine learning to…

Descriptors: Scoring Rubrics, Identification, At Risk Students, Standardized Tests

Boolean Analysis of Interobserver Agreement: Formal and Functional Evidence Sampling in Complex Coding Endeavors

Peer reviewed

Direct link

Solano-Flores, Guillermo – Educational Measurement: Issues and Practice, 2021

This article proposes a Boolean approach to representing and analyzing interobserver agreement in dichotomous coding. Building on the notion that observations are samples of a universe of observations, it submits that coding can be viewed as a process in which observers sample pieces of evidence on constructs. It distinguishes between formal and…

Descriptors: Online Searching, Coding, Interrater Reliability, Evidence

On the Superior Statistical Properties of Frequency Scales in Job Analyses

Peer reviewed

Direct link

Babcock, Ben; Risk, Nicole M.; Wyse, Adam E. – Educational Measurement: Issues and Practice, 2020

This study compared the statistical properties of four job analysis task survey response scale types: criticality, difficulty in learning, importance, and frequency. We used nine job analysis studies spanning two fields, medical imaging and allied health professionals, to compare the job analysis scales in terms of variability and interrater…

Descriptors: Job Analysis, Radiology, Allied Health Personnel, Surveys

Digital Module 12: Think-Aloud Interviews and Cognitive Labs https://ncme.elevate.commpartners.com

Peer reviewed

Direct link

Leighton, Jacqueline P.; Lehman, Blair – Educational Measurement: Issues and Practice, 2020

In this digital ITEMS module, Dr. Jacqueline Leighton and Dr. Blair Lehman review differences between think-aloud interviews to measure problem-solving processes and cognitive labs to measure comprehension processes. Learners are introduced to historical, theoretical, and procedural differences between these methods and how to use and analyze…

Descriptors: Protocol Analysis, Interviews, Problem Solving, Cognitive Processes

Methodologies for Investigating and Interpreting Student-Teacher Rating Incongruence in Noncognitive Assessment

Peer reviewed

Direct link

Flake, Jessica Kay; Petway, Kevin Terrance, II – Educational Measurement: Issues and Practice, 2019

Numerous studies merely note divergence in students' and teachers' ratings of student noncognitive constructs. However, given the increased attention and use of these constructs in educational research and practice, an in-depth study focused on this issue was needed. Using a variety of quantitative methodologies, we thoroughly investigate…

Descriptors: Teachers, Students, Achievement Rating, Interrater Reliability

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Rater Agreement in Test-to-Curriculum Alignment Reviews

Peer reviewed

Direct link

Traynor, A.; Merzdorf, H. E. – Educational Measurement: Issues and Practice, 2018

During the development of large-scale curricular achievement tests, recruited panels of independent subject-matter experts use systematic judgmental methods--often collectively labeled "alignment" methods--to rate the correspondence between a given test's items and the objective statements in a particular curricular standards document.…

Descriptors: Achievement Tests, Expertise, Alignment (Education), Test Items

Gauging Item Alignment through Online Systems While Controlling for Rater Effects

Peer reviewed

Direct link

Anderson, Daniel; Irvin, Shawn; Alonzo, Julie; Tindal, Gerald A. – Educational Measurement: Issues and Practice, 2015

The alignment of test items to content standards is critical to the validity of decisions made from standards-based tests. Generally, alignment is determined based on judgments made by a panel of content experts with either ratings averaged or via a consensus reached through discussion. When the pool of items to be reviewed is large, or the…

Descriptors: Test Items, Alignment (Education), Standards, Online Systems

Uncovering Multivariate Structure in Classroom Observations in the Presence of Rater Errors

Peer reviewed

Direct link

McCaffrey, Daniel F.; Yuan, Kun; Savitsky, Terrance D.; Lockwood, J. R.; Edelen, Maria O. – Educational Measurement: Issues and Practice, 2015

We examine the factor structure of scores from the CLASS-S protocol obtained from observations of middle school classroom teaching. Factor analysis has been used to support both interpretations of scores from classroom observation protocols, like CLASS-S, and the theories about teaching that underlie them. However, classroom observations contain…

Descriptors: Factor Structure, Multivariate Analysis, Scores, Factor Analysis

A Framework for Evaluation and Use of Automated Scoring

Peer reviewed

Direct link

Williamson, David M.; Xi, Xiaoming; Breyer, F. Jay – Educational Measurement: Issues and Practice, 2012

A framework for evaluation and use of automated scoring of constructed-response tasks is provided that entails both evaluation of automated scoring as well as guidelines for implementation and maintenance in the context of constantly evolving technologies. Consideration of validity issues and challenges associated with automated scoring are…

Descriptors: Automation, Scoring, Evaluation, Guidelines

Is Teaching Experience Necessary for Reliable Scoring of Extended English Questions?

Peer reviewed

Direct link

Royal-Dawson, Lucy; Baird, Jo-Anne – Educational Measurement: Issues and Practice, 2009

Hundreds of thousands of raters are recruited internationally to score examinations, but little research has been conducted on the selection criteria for these raters. Many countries insist upon teaching experience as a selection criterion and this has frequently become embedded in the cultural expectations surrounding the tests. Shortages in…

Descriptors: National Curriculum, Scoring, Foreign Countries, Teaching Experience

Alignment of Mathematics State-Level Standards and Assessments: The Role of Reviewer Agreement

Peer reviewed

Direct link

Webb, Noreen M.; Herman, Joan L.; Webb, Norman L. – Educational Measurement: Issues and Practice, 2007

This article examines the role of reviewer agreement in judgments about alignment between tests and standards. We used case data from three state alignment studies to explore how different approaches to incorporating reviewer agreement changes alignment conclusions. The three case studies showed varying degrees of reviewer agreement about…

Descriptors: Test Items, Case Studies, Mathematics, Interrater Reliability

Effects of Assigning Raters to Items

Peer reviewed

Direct link

Sykes, Robert C.; Ito, Kyoko; Wang, Zhen – Educational Measurement: Issues and Practice, 2008

Student responses to a large number of constructed response items in three Math and three Reading tests were scored on two occasions using three ways of assigning raters: single reader scoring, a different reader for each response (item-specific), and three readers each scoring a rater item block (RIB) containing approximately one-third of a…

Descriptors: Test Items, Mathematics Tests, Reading Tests, Scoring

Identifying Essential Topics in General and Special Education Introductory Assessment Textbooks

Peer reviewed

Direct link

Campbell, Cynthia; Collins, Vicki L. – Educational Measurement: Issues and Practice, 2007

We reviewed the five top-selling introductory assessment textbooks in both general and special education to identify topics contained in textbooks and to determine the extent of agreement among authors regarding the essentialness of topics within and across discipline. Content analysis across the 10 assessment textbooks yielded 73 topics related…

Descriptors: Special Education, Content Analysis, Textbook Evaluation, Textbook Content

Previous Page | Next Page »

Pages: 1 | 2

Alonzo, Julie	1
Anderson, Daniel	1
Babcock, Ben	1
Baird, Jo-Anne	1
Breyer, F. Jay	1
Burkhardt, Amy	1
Burton, Elizabeth	1
Campbell, Cynthia	1
Collins, Vicki L.	1
Edelen, Maria O.	1
Flake, Jessica Kay	1
Geisinger, Kurt F.	1
Guskey, Thomas R.	1
Herman, Joan L.	1
Irvin, Shawn	1
Ito, Kyoko	1
Jaeger, Richard M.	1
Lehman, Blair	1
Leighton, Jacqueline P.	1
Linn, Robert L.	1
Lockwood, J. R.	1
Lottridge, Susan	1
McCaffrey, Daniel F.	1
Merzdorf, H. E.	1
More ▼