ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	5

Source

Educational Measurement:…

Publication Type

Journal Articles	11
Reports - Evaluative	7
Reports - Descriptive	2
Reports - Research	2
Opinion Papers	1
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education	1
Secondary Education	1

Audience

Location

United Kingdom (England)

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 11 results Save | Export

Examining the Psychometric Impact of Targeted and Random Double-Scoring in Mixed-Format Assessments

Peer reviewed

Direct link

Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025

Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…

Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods

A Rubric for the Detection of Students in Crisis

Peer reviewed

Direct link

Burkhardt, Amy; Lottridge, Susan; Woolf, Sherri – Educational Measurement: Issues and Practice, 2021

For some students, standardized tests serve as a conduit to disclose sensitive issues of harm or distress that may otherwise go unreported. By detecting this writing, known as "crisis papers," testing programs have a unique opportunity to assist in mitigating the risk of harm to these students. The use of machine learning to…

Descriptors: Scoring Rubrics, Identification, At Risk Students, Standardized Tests

A Framework for Evaluation and Use of Automated Scoring

Peer reviewed

Direct link

Williamson, David M.; Xi, Xiaoming; Breyer, F. Jay – Educational Measurement: Issues and Practice, 2012

A framework for evaluation and use of automated scoring of constructed-response tasks is provided that entails both evaluation of automated scoring as well as guidelines for implementation and maintenance in the context of constantly evolving technologies. Consideration of validity issues and challenges associated with automated scoring are…

Descriptors: Automation, Scoring, Evaluation, Guidelines

Is Teaching Experience Necessary for Reliable Scoring of Extended English Questions?

Peer reviewed

Direct link

Royal-Dawson, Lucy; Baird, Jo-Anne – Educational Measurement: Issues and Practice, 2009

Hundreds of thousands of raters are recruited internationally to score examinations, but little research has been conducted on the selection criteria for these raters. Many countries insist upon teaching experience as a selection criterion and this has frequently become embedded in the cultural expectations surrounding the tests. Shortages in…

Descriptors: National Curriculum, Scoring, Foreign Countries, Teaching Experience

Effects of Assigning Raters to Items

Peer reviewed

Direct link

Sykes, Robert C.; Ito, Kyoko; Wang, Zhen – Educational Measurement: Issues and Practice, 2008

Student responses to a large number of constructed response items in three Math and three Reading tests were scored on two occasions using three ways of assigning raters: single reader scoring, a different reader for each response (item-specific), and three readers each scoring a rater item block (RIB) containing approximately one-third of a…

Descriptors: Test Items, Mathematics Tests, Reading Tests, Scoring

Performance-Based Assessment: Implications of Task Specificity.

Peer reviewed

Linn, Robert L.; Burton, Elizabeth – Educational Measurement: Issues and Practice, 1994

Generalizability of performance-based assessment scores across raters and tasks is examined, focusing on implications of generalizability analyses for specific uses and interpretations of assessment results. Although it seems probable that assessment conditions, task characteristics, and interactions with instructional experiences affect the…

Descriptors: Educational Assessment, Educational Experience, Generalizability Theory, Interaction

Using Standard-Setting Data to Establish Cutoff Scores.

Peer reviewed

Geisinger, Kurt F. – Educational Measurement: Issues and Practice, 1991

Ways to use standard-setting data to adjust cutoff scores on examinations are reviewed. Ten sources of information to be used in determining standards are listed. The decision to modify passing scores should be based on these types of information and consideration of adverse impact or rating process irregularities. (SLD)

Descriptors: Cutting Scores, Evaluation Utilization, Evaluators, Interrater Reliability

Factors Influencing Intrajudge Consistency during Standard-Setting.

Peer reviewed

Plake, Barbara S.; And Others – Educational Measurement: Issues and Practice, 1991

Possible sources of intrajudge inconsistency in standard setting are reviewed, and approaches are presented to improve the accuracy of rating. Procedures for providing judges with feedback through discussion or computerized communication are discussed. Monitoring and maintaining judges' consistency throughout the rating process are essential. (SLD)

Descriptors: Computer Assisted Instruction, Evaluators, Examiners, Feedback

Defining Minimal Competence.

Peer reviewed

Mills, Craig N.; And Others – Educational Measurement: Issues and Practice, 1991

An approach is presented to the definition of minimal competence for judges to use in standard setting. Panelists in standard setting must receive training to ensure that differences in rating result from differences in perceptions of item difficulty, not in differences of opinion about the definition of minimal competence. (SLD)

Descriptors: Cutting Scores, Decision Making, Definitions, Difficulty Level

Selection of Judges for Standard-Setting.

Peer reviewed

Jaeger, Richard M. – Educational Measurement: Issues and Practice, 1991

Issues concerning the selection of judges for standard setting are discussed. Determining the consistency of judges' recommendations, or their congruity with other expert recommendations, would help in selection. Enough judges must be chosen to allow estimation of recommendations by an entire population of judges. (SLD)

Descriptors: Cutting Scores, Evaluation Methods, Evaluators, Examiners

Training Judges to Generate Standard-Setting Data.

Peer reviewed

Reid, Jerry B. – Educational Measurement: Issues and Practice, 1991

Training judges to generate item ratings in standard setting once the reference group has been defined is discussed. It is proposed that sensitivity to the factors that determine difficulty can be improved through training. Three criteria for determining when training is sufficient are offered. (SLD)

Descriptors: Computer Assisted Instruction, Difficulty Level, Evaluators, Interrater Reliability

Interrater Reliability	11
Scoring	11
Evaluators	6
Test Interpretation	6
Standard Setting (Scoring)	5
Cutting Scores	3
Minimum Competencies	3
Minimum Competency Testing	3
Testing Problems	3
Computer Assisted Instruction	2
Difficulty Level	2
Evaluation Methods	2
Examiners	2
Identification	2
Selection	2
Academic Achievement	1
At Risk Students	1
Automation	1
Bias	1
Classification	1
Decision Making	1
Definitions	1
Educational Assessment	1
Educational Experience	1
English	1
More ▼

Baird, Jo-Anne	1
Breyer, F. Jay	1
Burkhardt, Amy	1
Burton, Elizabeth	1
Geisinger, Kurt F.	1
Ito, Kyoko	1
Jaeger, Richard M.	1
Linn, Robert L.	1
Lottridge, Susan	1
Mills, Craig N.	1
Plake, Barbara S.	1
Reid, Jerry B.	1
Royal-Dawson, Lucy	1
Stefanie A. Wind	1
Sykes, Robert C.	1
Wang, Zhen	1
Williamson, David M.	1
Woolf, Sherri	1
Xi, Xiaoming	1
Yangmeng Xu	1
More ▼