ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	1

Descriptor

Evaluation Methods	6
Performance Based Assessment	6
Writing Tests	6
Scoring	4
Writing Evaluation	3
Essays	2
Evaluators	2
Student Evaluation	2
Test Reliability	2
Administrator Attitudes	1
Cognitive Processes	1
Content Analysis	1
Criteria	1
Educational Assessment	1
Elementary Secondary Education	1
English Instruction	1
Evaluation	1
Experience	1
Feedback	1
Generalizability Theory	1
High Stakes Tests	1
Higher Education	1
Holistic Approach	1
Interrater Reliability	1
Language Tests	1
More ▼

Source

Diagnostique	1
Educational and Psychological…	1

Author

Wolfe, Edward W.	2
Chen, Michelle Y.	1
Crehan, Kevin D.	1
Feltovich, Brian	1
Howell, Kenneth W.	1
Kao, Chi-Wen	1
Liu, Yan	1
Saunders, Pearl I.	1
Zumbo, Bruno D.	1

Publication Type

Reports - Research	4
Speeches/Meeting Papers	4
Journal Articles	2
Reports - Evaluative	2

Education Level

Audience

Location

Arizona

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 6 results Save | Export

A Propensity Score Method for Investigating Differential Item Functioning in Performance Assessment

Peer reviewed

Direct link

Chen, Michelle Y.; Liu, Yan; Zumbo, Bruno D. – Educational and Psychological Measurement, 2020

This study introduces a novel differential item functioning (DIF) method based on propensity score matching that tackles two challenges in analyzing performance assessment data, that is, continuous task scores and lack of a reliable internal variable as a proxy for ability or aptitude. The proposed DIF method consists of two main stages. First,…

Descriptors: Probability, Scores, Evaluation Methods, Test Items

The Relationship between Scoring Procedures and Focus and the Reliability of Direct Writing Assessment Scores.

Download full text

Wolfe, Edward W.; Kao, Chi-Wen – 1996

This paper reports the results of an analysis of the relationship between scorer behaviors and score variability. Thirty-six essay scorers were interviewed and asked to perform a think-aloud task as they scored 24 essays. Each comment made by a scorer was coded according to its content focus (i.e. appearance, assignment, mechanics, communication,…

Descriptors: Content Analysis, Educational Assessment, Essays, Evaluation Methods

A Discussion of Analytic Scoring for Writing Performance Assessments.

Download full text

Crehan, Kevin D. – 1997

Writing fits well within the realm of outcomes suitable for observation by performance assessments. Studies of the reliability of performance assessments have suggested that interrater reliability can be consistently high. Scoring consistency, however, is only one aspect of quality in decisions based on assessment results. Another is…

Descriptors: Evaluation Methods, Feedback, Generalizability Theory, Interrater Reliability

Learning To Rate Essays: A Study of Scorer Cognition.

Download full text

Wolfe, Edward W.; Feltovich, Brian – 1994

This paper presents a model of scored cognition that incorporates two types of mental models: models of performance (i.e., the criteria for judging performance) and models of scoring (i.e., the procedural scripts for scoring an essay). In Study 1, six novice and five experienced scorers wrote definitions of three levels of a 6-point holistic…

Descriptors: Cognitive Processes, Criteria, Essays, Evaluation Methods

Primary Trait Scoring: A Direct Assessment Option for Educators.

Download full text

Saunders, Pearl I. – 1999

The paper examines an assessment method for measuring students' writing performance. Does Primary Trait Scoring reliably and validly accomplish the administrative, instructional, and evaluative purposes of the writing assessment? The Primary Trait Scoring guide has a few underlying principles: identification of qualities of effective writing;…

Descriptors: English Instruction, Evaluation, Evaluation Methods, Higher Education

Bias in Authentic Assessment.

Howell, Kenneth W.; And Others – Diagnostique, 1993

This survey of educators examined validity issues in Arizona's program of authentic assessment of written communication. The paper concludes that authentic measures lack meaningful standards. Major flaws were reported in the areas of "fairness,""transfer and generalizability,""content quality," and…

Descriptors: Administrator Attitudes, Elementary Secondary Education, Evaluation Methods, Minority Groups