ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	11

Descriptor

Correlation	14
Reliability	8
Interrater Reliability	5
Scoring	5
Validity	5
Comparative Analysis	4
Essays	3
Evaluators	3
Grade 3	3
Statistical Analysis	3
Academic Standards	2
Elementary School Students	2
Foreign Countries	2
Grade 5	2
Item Analysis	2
Multiple Choice Tests	2
Scores	2
Scoring Rubrics	2
Standard Setting (Scoring)	2
Student Evaluation	2
Test Construction	2
Test Scoring Machines	2
Test Validity	2
Weighted Scores	2
Writing Evaluation	2
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	14
Reports - Research	10
Reports - Evaluative	4
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Grade 3	3
Early Childhood Education	2
Elementary Education	2
Grade 5	2
Higher Education	2
Primary Education	2
Elementary Secondary Education	1
Grade 1	1
Grade 2	1
Grade 4	1
Grade 6	1
Grade 8	1
Intermediate Grades	1
Middle Schools	1
Postsecondary Education	1
More ▼

Audience

Location

Germany	1
Israel	1
Tennessee	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	1
Stanford Achievement Tests	1
United States Medical…	1

What Works Clearinghouse Rating

Showing all 14 results Save | Export

Evaluating Random and Systematic Error in Student Growth Percentiles

Peer reviewed

Direct link

Wells, Craig S.; Sireci, Stephen G. – Applied Measurement in Education, 2020

Student growth percentiles (SGPs) are currently used by several states and school districts to provide information about individual students as well as to evaluate teachers, schools, and school districts. For SGPs to be defensible for these purposes, they should be reliable. In this study, we examine the amount of systematic and random error in…

Descriptors: Growth Models, Reliability, Scores, Error Patterns

Of Small Beauties and Large Beasts: The Quality of Distractors on Multiple-Choice Tests Is More Important than Their Quantity

Peer reviewed

Direct link

Papenberg, Martin; Musch, Jochen – Applied Measurement in Education, 2017

In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…

Descriptors: Multiple Choice Tests, Test Items, Test Validity, Test Reliability

Validating Human and Automated Scoring of Essays against "True" Scores

Peer reviewed

Direct link

Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018

In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…

Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing

Evaluating Comparative Judgment as an Approach to Essay Scoring

Peer reviewed

Direct link

Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…

Descriptors: Essays, Scoring, Comparative Analysis, Evaluators

Stability of Teacher Value-Added Rankings across Measurement Model and Scaling Conditions

Peer reviewed

Direct link

Hawley, Leslie R.; Bovaird, James A.; Wu, ChaoRong – Applied Measurement in Education, 2017

Value-added assessment methods have been criticized by researchers and policy makers for a number of reasons. One issue includes the sensitivity of model results across different outcome measures. This study examined the utility of incorporating multivariate latent variable approaches within a traditional value-added framework. We evaluated the…

Descriptors: Value Added Models, Reliability, Multivariate Analysis, Scaling

Increasing the Validity of Angoff Standards through Analysis of Judge-Level Internal Consistency

Peer reviewed

Direct link

Clauser, Jerome C.; Clauser, Brian E.; Hambleton, Ronald K. – Applied Measurement in Education, 2014

The purpose of the present study was to extend past work with the Angoff method for setting standards by examining judgments at the judge level rather than the panel level. The focus was on investigating the relationship between observed Angoff standard setting judgments and empirical conditional probabilities. This relationship has been used as a…

Descriptors: Standard Setting (Scoring), Validity, Reliability, Correlation

Validating Automated Essay Scoring: A (Modest) Refinement of the "Gold Standard"

Peer reviewed

Direct link

Powers, Donald E.; Escoffery, David S.; Duchnowski, Matthew P. – Applied Measurement in Education, 2015

By far, the most frequently used method of validating (the interpretation and use of) automated essay scores has been to compare them with scores awarded by human raters. Although this practice is questionable, human-machine agreement is still often regarded as the "gold standard." Our objective was to refine this model and apply it to…

Descriptors: Essays, Test Scoring Machines, Program Validation, Criterion Referenced Tests

Criterion-Focused Approach to Reducing Adverse Impact in College Admissions

Peer reviewed

Direct link

Sinha, Ruchi; Oswald, Frederick; Imus, Anna; Schmitt, Neal – Applied Measurement in Education, 2011

The current study examines how using a multidimensional battery of predictors (high-school grade point average (GPA), SAT/ACT, and biodata), and weighting the predictors based on the different values institutions place on various student performance dimensions (college GPA, organizational citizenship behaviors (OCBs), and behaviorally anchored…

Descriptors: Grade Point Average, Interrater Reliability, Rating Scales, College Admission

Providing Subscale Scores for Diagnostic Information: A Case Study when the Test Is Essentially Unidimensional

Peer reviewed

Direct link

Stone, Clement A.; Ye, Feifei; Zhu, Xiaowen; Lane, Suzanne – Applied Measurement in Education, 2010

Although reliability of subscale scores may be suspect, subscale scores are the most common type of diagnostic information included in student score reports. This research compared methods for augmenting the reliability of subscale scores for an 8th-grade mathematics assessment. Yen's Objective Performance Index, Wainer et al.'s augmented scores,…

Descriptors: Item Response Theory, Case Studies, Reliability, Scores

The Critical Role of Anchor Paper Selection in Writing Assessment

Peer reviewed

Direct link

Osborn Popp, Sharon E.; Ryan, Joseph M.; Thompson, Marilyn S. – Applied Measurement in Education, 2009

Scoring rubrics are routinely used to evaluate the quality of writing samples produced for writing performance assessments, with anchor papers chosen to represent score points defined in the rubric. Although the careful selection of anchor papers is associated with best practices for scoring, little research has been conducted on the role of…

Descriptors: Writing Evaluation, Scoring Rubrics, Selection, Scoring

The Effects of Nonnormality and Number of Response Categories on Reliability.

Peer reviewed

Bandalos, Deborah L.; Enders, Craig K. – Applied Measurement in Education, 1996

Computer simulation indicated that reliability increased with the degree of similarity between underlying and observed distributions when the observed categorical distribution was deliberately constructed to match the shape of the underlying distribution of the trait being measured. Reliability also increased with correlation among variables and…

Descriptors: Computer Simulation, Correlation, Likert Scales, Reliability

Peer reviewed

Direct link

Webb, Norman L. – Applied Measurement in Education, 2007

A process for judging the alignment between curriculum standards and assessments developed by the author is presented. This process produces information on the relationship of standards and assessments on four alignment criteria: Categorical Concurrence, Depth of Knowledge Consistency, Range of Knowledge Correspondence, and Balance of…

Descriptors: Educational Assessment, Academic Standards, Item Analysis, Interrater Reliability

Can Validity Rise When Reliability Declines?

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 1997

It has often been asserted that the reliability of a measure places an upper limit on its validity. This article demonstrates in theory that validity can rise when reliability declines, even when validity evidence is a correlation with an acceptable criterion. Whether empirical examples can actually be found is an open question. (SLD)

Descriptors: Correlation, Criteria, Reliability, Test Construction

Cross-State Comparability of Judgments of Student Writing: Results from the New Standards Project.

Peer reviewed

Linn, Robert L.; And Others – Applied Measurement in Education, 1992

Ten states participated in a cross-state scoring workshop in 1991, evaluating writing from elementary school, middle school, and high school students. Correlation of scores assigned by readers from one state with those from readers from another state were generally quite high. Implications for defining common standards are discussed. (SLD)

Descriptors: Comparative Analysis, Correlation, Elementary School Students, Elementary Secondary Education

Bandalos, Deborah L.	1
Ben-Simon, Anat	1
Bovaird, James A.	1
Clauser, Brian E.	1
Clauser, Jerome C.	1
Cohen, Yoav	1
Duchnowski, Matthew P.	1
Enders, Craig K.	1
Escoffery, David S.	1
Feldt, Leonard S.	1
Ferrara, Steve	1
Hambleton, Ronald K.	1
Hawley, Leslie R.	1
Imus, Anna	1
Lane, Suzanne	1
Levi, Effi	1
Linn, Robert L.	1
Musch, Jochen	1
Osborn Popp, Sharon E.	1
Oswald, Frederick	1
Papenberg, Martin	1
Powers, Donald E.	1
Ryan, Joseph M.	1
Schmitt, Neal	1
More ▼