ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	5

Descriptor

Models	7
Evaluation Methods	4
Scoring	4
Decision Making	3
Test Validity	3
Construct Validity	2
Educational Assessment	2
Essay Tests	2
Evaluators	2
Inferences	2
Knowledge Level	2
Performance Based Assessment	2
Standard Setting (Scoring)	2
Standards	2
Statistical Analysis	2
Validity	2
Accuracy	1
Behavior	1
Certification	1
Comparative Analysis	1
Credentials	1
Criteria	1
Critical Thinking	1
Culture Fair Tests	1
Curriculum Development	1
More ▼

Source

Applied Measurement in…

Author

Adams, Elizabeth	1
Bejar, Isaac I.	1
Cohen, Allan	1
Ercikan, Kadriye	1
Hambleton, Ronald K.	1
Ketterlin-Geller, Leanne R.	1
Lee, Hee-Sun	1
Li, Chen	1
Linn, Marcia C.	1
Liu, Ou Lydia	1
McCaffrey, Daniel	1
Oliveri, María Elena	1
Perry, Lindsey	1
Plake, Barbara S.	1
Raczynski, Kevin	1
Slater, Sharon C.	1
More ▼

Publication Type

Journal Articles	7
Reports - Evaluative	4
Reports - Research	2
Information Analyses	1
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Middle Schools	2
Grade 7	1
High Schools	1
Junior High Schools	1
Secondary Education	1

Audience

Location

Arizona	1
Massachusetts	1
North Carolina	1
Virginia	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 7 results Save | Export

Predictive Modeling of Rater Behavior: Implications for Quality Assurance in Essay Scoring

Peer reviewed

Direct link

Bejar, Isaac I.; Li, Chen; McCaffrey, Daniel – Applied Measurement in Education, 2020

We evaluate the feasibility of developing predictive models of rater behavior, that is, "rater-specific" models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays…

Descriptors: Scoring, Essays, Behavior, Predictive Measurement

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

Integrating Validation Arguments with the Assessment Triangle: A Framework for Operationalizing and Instantiating Validation

Peer reviewed

Direct link

Ketterlin-Geller, Leanne R.; Perry, Lindsey; Adams, Elizabeth – Applied Measurement in Education, 2019

Despite the call for an argument-based approach to validity over 25 years ago, few examples exist in the published literature. One possible explanation for this outcome is that the complexity of the argument-based approach makes implementation difficult. To counter this claim, we propose that the Assessment Triangle can serve as the overarching…

Descriptors: Validity, Educational Assessment, Models, Screening Tests

In Search of Validity Evidence in Support of the Interpretation and Use of Assessments of Complex Constructs: Discussion of Research on Assessing 21st Century Skills

Peer reviewed

Direct link

Ercikan, Kadriye; Oliveri, María Elena – Applied Measurement in Education, 2016

Assessing complex constructs such as those discussed under the umbrella of 21st century constructs highlights the need for a principled assessment design and validation approach. In our discussion, we made a case for three considerations: (a) taking construct complexity into account across various stages of assessment development such as the…

Descriptors: Evaluation Methods, Test Construction, Design, Scaling

Validating Measurement of Knowledge Integration in Science Using Multiple-Choice and Explanation Items

Peer reviewed

Direct link

Lee, Hee-Sun; Liu, Ou Lydia; Linn, Marcia C. – Applied Measurement in Education, 2011

This study explores measurement of a construct called knowledge integration in science using multiple-choice and explanation items. We use construct and instructional validity evidence to examine the role multiple-choice and explanation items plays in measuring students' knowledge integration ability. For construct validity, we analyze item…

Descriptors: Knowledge Level, Construct Validity, Validity, Scaffolding (Teaching Technique)

Reliability of Credentialing Examinations and the Impact of Scoring Models and Standard-Setting Policies.

Peer reviewed

Hambleton, Ronald K.; Slater, Sharon C. – Applied Measurement in Education, 1997

A brief history of developments in the assessment of the reliability of credentialing examinations is presented, and some new results are outlined that highlight the interactions among scoring, standard setting, and the reliability and validity of pass-fail decisions. Decision consistency is an important concept in evaluating credentialing…

Descriptors: Certification, Credentials, Decision Making, Interaction

The Performance Domain and the Structure of the Decision Space.

Peer reviewed

Plake, Barbara S. – Applied Measurement in Education, 1995

This article provides a framework for the rest of the articles in this special issue comparing the utility of three standard-setting methods with complex performance assessments. The context of the standard setting study is described, and the methods are outlined. (SLD)

Descriptors: Comparative Analysis, Criteria, Decision Making, Educational Assessment