Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 11 |
Descriptor
Correlation | 14 |
Reliability | 8 |
Interrater Reliability | 5 |
Scoring | 5 |
Validity | 5 |
Comparative Analysis | 4 |
Essays | 3 |
Evaluators | 3 |
Grade 3 | 3 |
Statistical Analysis | 3 |
Academic Standards | 2 |
More ▼ |
Source
Applied Measurement in… | 14 |
Author
Publication Type
Journal Articles | 14 |
Reports - Research | 10 |
Reports - Evaluative | 4 |
Reports - Descriptive | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Grade 3 | 3 |
Early Childhood Education | 2 |
Elementary Education | 2 |
Grade 5 | 2 |
Higher Education | 2 |
Primary Education | 2 |
Elementary Secondary Education | 1 |
Grade 1 | 1 |
Grade 2 | 1 |
Grade 4 | 1 |
Grade 6 | 1 |
More ▼ |
Audience
Location
Germany | 1 |
Israel | 1 |
Tennessee | 1 |
United States | 1 |
Laws, Policies, & Programs
Assessments and Surveys
SAT (College Admission Test) | 1 |
Stanford Achievement Tests | 1 |
United States Medical… | 1 |
What Works Clearinghouse Rating
Wells, Craig S.; Sireci, Stephen G. – Applied Measurement in Education, 2020
Student growth percentiles (SGPs) are currently used by several states and school districts to provide information about individual students as well as to evaluate teachers, schools, and school districts. For SGPs to be defensible for these purposes, they should be reliable. In this study, we examine the amount of systematic and random error in…
Descriptors: Growth Models, Reliability, Scores, Error Patterns
Papenberg, Martin; Musch, Jochen – Applied Measurement in Education, 2017
In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…
Descriptors: Multiple Choice Tests, Test Items, Test Validity, Test Reliability
Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018
In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…
Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators
Hawley, Leslie R.; Bovaird, James A.; Wu, ChaoRong – Applied Measurement in Education, 2017
Value-added assessment methods have been criticized by researchers and policy makers for a number of reasons. One issue includes the sensitivity of model results across different outcome measures. This study examined the utility of incorporating multivariate latent variable approaches within a traditional value-added framework. We evaluated the…
Descriptors: Value Added Models, Reliability, Multivariate Analysis, Scaling
Clauser, Jerome C.; Clauser, Brian E.; Hambleton, Ronald K. – Applied Measurement in Education, 2014
The purpose of the present study was to extend past work with the Angoff method for setting standards by examining judgments at the judge level rather than the panel level. The focus was on investigating the relationship between observed Angoff standard setting judgments and empirical conditional probabilities. This relationship has been used as a…
Descriptors: Standard Setting (Scoring), Validity, Reliability, Correlation
Powers, Donald E.; Escoffery, David S.; Duchnowski, Matthew P. – Applied Measurement in Education, 2015
By far, the most frequently used method of validating (the interpretation and use of) automated essay scores has been to compare them with scores awarded by human raters. Although this practice is questionable, human-machine agreement is still often regarded as the "gold standard." Our objective was to refine this model and apply it to…
Descriptors: Essays, Test Scoring Machines, Program Validation, Criterion Referenced Tests
Sinha, Ruchi; Oswald, Frederick; Imus, Anna; Schmitt, Neal – Applied Measurement in Education, 2011
The current study examines how using a multidimensional battery of predictors (high-school grade point average (GPA), SAT/ACT, and biodata), and weighting the predictors based on the different values institutions place on various student performance dimensions (college GPA, organizational citizenship behaviors (OCBs), and behaviorally anchored…
Descriptors: Grade Point Average, Interrater Reliability, Rating Scales, College Admission
Stone, Clement A.; Ye, Feifei; Zhu, Xiaowen; Lane, Suzanne – Applied Measurement in Education, 2010
Although reliability of subscale scores may be suspect, subscale scores are the most common type of diagnostic information included in student score reports. This research compared methods for augmenting the reliability of subscale scores for an 8th-grade mathematics assessment. Yen's Objective Performance Index, Wainer et al.'s augmented scores,…
Descriptors: Item Response Theory, Case Studies, Reliability, Scores
Osborn Popp, Sharon E.; Ryan, Joseph M.; Thompson, Marilyn S. – Applied Measurement in Education, 2009
Scoring rubrics are routinely used to evaluate the quality of writing samples produced for writing performance assessments, with anchor papers chosen to represent score points defined in the rubric. Although the careful selection of anchor papers is associated with best practices for scoring, little research has been conducted on the role of…
Descriptors: Writing Evaluation, Scoring Rubrics, Selection, Scoring

Bandalos, Deborah L.; Enders, Craig K. – Applied Measurement in Education, 1996
Computer simulation indicated that reliability increased with the degree of similarity between underlying and observed distributions when the observed categorical distribution was deliberately constructed to match the shape of the underlying distribution of the trait being measured. Reliability also increased with correlation among variables and…
Descriptors: Computer Simulation, Correlation, Likert Scales, Reliability
Webb, Norman L. – Applied Measurement in Education, 2007
A process for judging the alignment between curriculum standards and assessments developed by the author is presented. This process produces information on the relationship of standards and assessments on four alignment criteria: Categorical Concurrence, Depth of Knowledge Consistency, Range of Knowledge Correspondence, and Balance of…
Descriptors: Educational Assessment, Academic Standards, Item Analysis, Interrater Reliability

Feldt, Leonard S. – Applied Measurement in Education, 1997
It has often been asserted that the reliability of a measure places an upper limit on its validity. This article demonstrates in theory that validity can rise when reliability declines, even when validity evidence is a correlation with an acceptable criterion. Whether empirical examples can actually be found is an open question. (SLD)
Descriptors: Correlation, Criteria, Reliability, Test Construction

Linn, Robert L.; And Others – Applied Measurement in Education, 1992
Ten states participated in a cross-state scoring workshop in 1991, evaluating writing from elementary school, middle school, and high school students. Correlation of scores assigned by readers from one state with those from readers from another state were generally quite high. Implications for defining common standards are discussed. (SLD)
Descriptors: Comparative Analysis, Correlation, Elementary School Students, Elementary Secondary Education