ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	10

Descriptor

Decision Making	19
Evaluators	19
Performance Based Assessment	19
Standards	8
Evaluation Methods	7
Interrater Reliability	5
Simulation	5
Standard Setting (Scoring)	5
English (Second Language)	4
Foreign Countries	4
Scoring	4
Second Language Instruction	4
Teacher Evaluation	4
Case Studies	3
College Faculty	3
Elementary Secondary Education	3
Judges	3
Language Tests	3
Oral Language	3
Scores	3
Second Language Learning	3
Criteria	2
Educational Assessment	2
Feedback (Response)	2
High Stakes Tests	2
More ▼

Source

Applied Measurement in…	5
Advances in Health Sciences…	1
Assessment & Evaluation in…	1
Educational Measurement:…	1
Educational Researcher	1
Educational and Psychological…	1
Journal of Educational…	1
Journal of Personnel…	1
Language Assessment Quarterly	1
Language Testing	1
Language Testing in Asia	1
ProQuest LLC	1
Research Evaluation	1
Studies in Applied…	1
More ▼

Publication Type

Journal Articles	17
Reports - Research	9
Reports - Evaluative	7
Information Analyses	3
Speeches/Meeting Papers	2
Dissertations/Theses -…	1
Opinion Papers	1
Reports - Descriptive	1

Education Level

Higher Education	6
Postsecondary Education	5
High Schools	1
Secondary Education	1

Audience

Location

Japan	2
Australia	1
New York (New York)	1
Singapore	1
United Kingdom (Glasgow)	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

Revisiting the Effectiveness of a Performance Decision Tree-Style Rubric Compared to a Grid-Style Rubric

Peer reviewed

Direct link

Yuichiro Yokouchi – Language Testing in Asia, 2025

The performance decision tree (PDT; Fulcher et al., 2011) is a rubric style that is applicable to performance assessment, with origins in Upshur and Turner's (1995) empirically derived binary-choice, boundary-definition (EBB) scale. It is easier for raters to assess performance by evaluating multiple binary-choice descriptors. Additionally,…

Descriptors: Scoring Rubrics, Second Language Learning, Second Language Instruction, Language Teachers

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

A Sequential Approach to Detecting Differential Rater Functioning in Sparse Rater-Mediated Assessment Networks

Peer reviewed

Direct link

Wind, Stefanie A. – Language Testing, 2023

Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting…

Descriptors: Evaluators, Decision Making, Student Characteristics, Performance Based Assessment

Exploring the Influence of Judge Proficiency on Standard-Setting Judgments

Peer reviewed

Direct link

Peabody, Michael R.; Wind, Stefanie A. – Journal of Educational Measurement, 2019

Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations. Although all cut score decisions are by nature somewhat arbitrary, they should not be capricious. Judges selected for standard-setting panels should have the proper qualifications to make the judgments asked…

Descriptors: Standard Setting, Decision Making, Performance Based Assessment, Evaluators

Generalizability of Writing Scores and Language Program Placement Decisions: Score Dependability, Task Variability, and Score Profiles on an ESL Placement Test

Peer reviewed
PDF on ERIC

Download full text

Eskin, Daniel – Studies in Applied Linguistics & TESOL, 2022

For agencies that deliver high-stakes Second Language (L2) proficiency exams, a research agenda has been undertaken for years to examine the role of rater, task, and rubric as sources of variability into their performance assessments (Lee, 2006; Sawaki & Sinharay, 2013; Xi, 2007; Xi & Mollaun, 2006). However, these challenges are more…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Student Placement

Referees or Sponsors? The Role of Evaluators in the Promotion of Research Scientists in a Public Research Organization

Peer reviewed

Direct link

Glennie, Miriam; O'Donnell, Michael; Brown, Michelle; Benson, John – Research Evaluation, 2019

Evaluators play a central role in assessments of researchers' performance for reward, but the nature of their role and influence is not well understood. Ongoing reliance on evaluator judgement is typically justified as a need for referees in contests for reward, because quantitative performance measures alone can be subject to distortion. Yet, if…

Descriptors: Research Universities, Role, Evaluators, Focus Groups

Developing Interactive Oral Assessments to Foster Graduate Attributes in Higher Education

Peer reviewed

Direct link

Tan, Chin Pei; Howes, Dora; Tan, Rendell K. W.; Dancza, Karina M. – Assessment & Evaluation in Higher Education, 2022

Interactive oral assessments demonstrate potential to develop graduate attributes such as critical thinking, professional communication and collaborative skills in students through authentic simulation of workplace scenarios. This study captured the design, delivery and evaluation of interactive oral assessments across three programmes --…

Descriptors: Oral Language, Interaction, Critical Thinking, Communication Skills

Seeing the Same Thing Differently

Peer reviewed

Direct link

Yeates, Peter; O'Neill, Paul; Mann, Karen; Eva, Kevin – Advances in Health Sciences Education, 2013

Assessors' scores in performance assessments are known to be highly variable. Attempted improvements through training or rating format have achieved minimal gains. The mechanisms that contribute to variability in assessors' scoring remain unclear. This study investigated these mechanisms. We used a qualitative approach to study…

Descriptors: Performance Based Assessment, Scores, Evaluators, Scoring

An Alternative Decision-Making Procedure for Performance Assessments: Using the Multifaceted Rash Model to Generate Cut Estimates

Peer reviewed

Direct link

Kozaki, Yoko – Language Assessment Quarterly, 2010

This article describes an alternative approach to setting standards for performance assessments. The procedure was designed for use in low-budget, relatively low-stakes contexts where it is not possible to bring expert judges together. The procedure that allowed participant judges to work individually throughout the process was an effort to…

Descriptors: Performance Based Assessment, Standard Setting, Decision Making, Certification

Rater Effects in ITA Testing: ESL Teachers' versus American Undergraduates' Judgments of Accentedness, Comprehensibility, and Oral Proficiency

Direct link

Hsieh, Ching-Ni – ProQuest LLC, 2011

Second language (L2) oral performance assessment always involves raters' subjective judgments and is thus subject to rater variability. The variability due to rater characteristics has important consequential impacts on decision-making processes, particularly in high-stakes testing situations (Bachman, Lynch, & Mason, 1995; A. Brown, 1995;…

Descriptors: Undergraduate Students, Phonology, Teaching Assistants, Foreign Students

Interjudge Reliability and Decision Reproducibility.

Peer reviewed

Lunz, Mary E.; And Others – Educational and Psychological Measurement, 1994

In a study involving eight judges, analysis with the FACETS model provides evidence that judges grade differently, whether or not scores correlate well. This outcome suggests that adjustments for differences among judges should be made before student measures are estimated to produce reproducible decisions. (SLD)

Descriptors: Correlation, Decision Making, Evaluation Methods, Evaluators

Something Old, Something New, Something Borrowed, a Lot to Do.

Peer reviewed

Berk, Ronald A. – Applied Measurement in Education, 1995

A brief summary of standard setting knowledge is presented, derived from about 20 methods that utilize a judgmental review process, the approach most relevant to the standard-setting strategies proposed in this special issue. Criteria for judging effectiveness and critiques of the methods discussed in the issue are offered. (SLD)

Descriptors: Criteria, Decision Making, Educational History, Evaluation Methods

Comments on Methods of Setting Standards for Complex Performance Tasks.

Peer reviewed

Mills, Craig N. – Applied Measurement in Education, 1995

The articles of this special issue propose two methods of deriving an initial standard and one method for determining the extent to which the standard should include compensation. Much work remains to be done on further development of the methods and the larger issues of policy regarding performance assessment. (SLD)

Descriptors: Decision Making, Educational Policy, Evaluation Methods, Evaluators

Setting Performance Standards through Two-Stage Judgmental Policy Capturing.

Peer reviewed

Jaeger, Richard M. – Applied Measurement in Education, 1995

A performance-standard setting procedure termed judgmental policy capturing (JPC) and its application are described. A study involving 12 panelists demonstrated the feasibility of the JPC method for setting performance standards for classroom teachers seeking certification from the National Board for Professional Teaching Standards. (SLD)

Descriptors: Decision Making, Educational Assessment, Evaluation Methods, Evaluators

An Integration and Reprise: What We Think We Have Learned.

Peer reviewed

Plake, Barbara S. – Applied Measurement in Education, 1995

The three standard-setting approaches described in this special issue are summarized and contrasted: (1) judgmental policy capturing; (2) the extended Angoff method; and (3) the dominant profile method. An integrative summary of findings is followed by recommendations for modifying the methods. (SLD)

Descriptors: Decision Making, Elementary Secondary Education, Evaluation Methods, Evaluators

Previous Page | Next Page »

Pages: 1 | 2

Wind, Stefanie A.	3
Benson, John	1
Berk, Ronald A.	1
Brown, Michelle	1
Brull, Harry	1
Collins, Kathleen M.	1
Dancza, Karina M.	1
Delandshere, Ginette	1
Eskin, Daniel	1
Eva, Kevin	1
Glennie, Miriam	1
Howes, Dora	1
Hsieh, Ching-Ni	1
Jaeger, Richard M.	1
Kaiser, Paul D.	1
Kozaki, Yoko	1
Lunz, Mary E.	1
Mann, Karen	1
Mills, Craig N.	1
Moss, Pamela A.	1
O'Donnell, Michael	1
O'Neill, Paul	1
Peabody, Michael R.	1
Petrosky, Anthony R.	1
More ▼