ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	9

Descriptor

Reliability	17
Validity	17
Correlation	5
Test Construction	5
Item Response Theory	4
Scores	4
Comparative Analysis	3
Generalizability Theory	3
Student Evaluation	3
Test Items	3
Test Use	3
Accuracy	2
Educational Assessment	2
Educational Research	2
Elementary School Teachers	2
Error Patterns	2
Grade 8	2
Mathematics Tests	2
Performance Based Assessment	2
Portfolios (Background…	2
Psychometrics	2
Reaction Time	2
Reading Tests	2
Scoring	2
Standards	2
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	17
Reports - Research	9
Reports - Evaluative	6
Reports - Descriptive	2

Education Level

Grade 8	2
Elementary Secondary Education	1
Grade 12	1
Grade 3	1
Grade 5	1
High Schools	1
Higher Education	1

Audience

Location

California	1
California (Los Angeles)	1
Netherlands	1

Laws, Policies, & Programs

Assessments and Surveys

Texas Assessment of Academic…	1
United States Medical…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 17 results Save | Export

An Information-Based Approach to Identifying Rapid-Guessing Thresholds

Peer reviewed

Direct link

Wise, Steven L. – Applied Measurement in Education, 2019

The identification of rapid guessing is important to promote the validity of achievement test scores, particularly with low-stakes tests. Effective methods for identifying rapid guesses require reliable threshold methods that are also aligned with test taker behavior. Although several common threshold methods are based on rapid guessing response…

Descriptors: Guessing (Tests), Identification, Reaction Time, Reliability

Evaluating Score and Decision Consistency across Claims in a Validation Argument

Peer reviewed

Direct link

Schmidgall, Jonathan – Applied Measurement in Education, 2017

This study utilizes an argument-based approach to validation to examine the implications of reliability in order to further differentiate the concepts of score and decision consistency. In a methodological example, the framework of generalizability theory was used to estimate appropriate indices of score consistency and evaluations of the…

Descriptors: Scores, Reliability, Validity, Generalizability Theory

Evaluating Comparative Judgment as an Approach to Essay Scoring

Peer reviewed

Direct link

Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…

Descriptors: Essays, Scoring, Comparative Analysis, Evaluators

Increasing the Validity of Angoff Standards through Analysis of Judge-Level Internal Consistency

Peer reviewed

Direct link

Clauser, Jerome C.; Clauser, Brian E.; Hambleton, Ronald K. – Applied Measurement in Education, 2014

The purpose of the present study was to extend past work with the Angoff method for setting standards by examining judgments at the judge level rather than the panel level. The focus was on investigating the relationship between observed Angoff standard setting judgments and empirical conditional probabilities. This relationship has been used as a…

Descriptors: Standard Setting (Scoring), Validity, Reliability, Correlation

The Effect of Small Group Discussion on Cutoff Scores during Standard Setting

Peer reviewed

Direct link

Deunk, Marjolein I.; van Kuijk, Mechteld F.; Bosker, Roel J. – Applied Measurement in Education, 2014

Standard setting methods, like the Bookmark procedure, are used to assist education experts in formulating performance standards. Small group discussion is meant to help these experts in setting more reliable and valid cutoff scores. This study is an analysis of 15 small group discussions during two standards setting trajectories and their effect…

Descriptors: Cutting Scores, Standard Setting, Group Discussion, Reading Tests

Item Difficulty and Interviewer Knowledge Effects on the Accuracy and Consistency of Examinee Response Processes in Verbal Reports

Peer reviewed

Direct link

Leighton, Jacqueline P. – Applied Measurement in Education, 2013

The Standards for Educational and Psychological Testing indicate that multiple sources of validity evidence should be used to support the interpretation of test scores. In the past decade, examinee response processes, as a source of validity evidence, have received increased attention. However, there have been relatively few methodological studies…

Descriptors: Psychological Testing, Standards, Interviews, Protocol Analysis

Providing Subscale Scores for Diagnostic Information: A Case Study when the Test Is Essentially Unidimensional

Peer reviewed

Direct link

Stone, Clement A.; Ye, Feifei; Zhu, Xiaowen; Lane, Suzanne – Applied Measurement in Education, 2010

Although reliability of subscale scores may be suspect, subscale scores are the most common type of diagnostic information included in student score reports. This research compared methods for augmenting the reliability of subscale scores for an 8th-grade mathematics assessment. Yen's Objective Performance Index, Wainer et al.'s augmented scores,…

Descriptors: Item Response Theory, Case Studies, Reliability, Scores

Application of Generalizability Theory to Concept Map Assessment Research

Peer reviewed

Direct link

Yin, Yue; Shavelson, Richard J. – Applied Measurement in Education, 2008

In the first part of this article, the use of Generalizability (G) theory in examining the dependability of concept map assessment scores and designing a concept map assessment for a particular practical application is discussed. In the second part, the application of G theory is demonstrated by comparing the technical qualities of two frequently…

Descriptors: Generalizability Theory, Concept Mapping, Validity, Reliability

The Critical Role of Anchor Paper Selection in Writing Assessment

Peer reviewed

Direct link

Osborn Popp, Sharon E.; Ryan, Joseph M.; Thompson, Marilyn S. – Applied Measurement in Education, 2009

Scoring rubrics are routinely used to evaluate the quality of writing samples produced for writing performance assessments, with anchor papers chosen to represent score points defined in the rubric. Although the careful selection of anchor papers is associated with best practices for scoring, little research has been conducted on the role of…

Descriptors: Writing Evaluation, Scoring Rubrics, Selection, Scoring

The Reliability and Validity of Weighted Composite Scores

Peer reviewed

Direct link

Kane, Michael; Case, Susan M. – Applied Measurement in Education, 2004

The scores on 2 distinct tests (e.g., essay and objective) are often combined to create a composite score, which is used to make decisions. The validity of the observed composite can sometimes be evaluated relative to an external criterion. However, in cases where no criterion is available, the observed composite has generally been evaluated in…

Descriptors: Validity, Weighted Scores, Reliability, Student Evaluation

Can Validity Rise When Reliability Declines?

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 1997

It has often been asserted that the reliability of a measure places an upper limit on its validity. This article demonstrates in theory that validity can rise when reliability declines, even when validity evidence is a correlation with an acceptable criterion. Whether empirical examples can actually be found is an open question. (SLD)

Descriptors: Correlation, Criteria, Reliability, Test Construction

Higher Validity in the Face of Lower Reliability: Another Look.

Peer reviewed

Frary, Robert B. – Applied Measurement in Education, 2000

Characterizes the circumstances under which validity changes may occur as a result of the deletion of a predictor test segment. Equations show that, for a positive outcome, one should seek a relatively large correlation between the scores from the deleted segment and the remaining items, with a relatively low correlation between scores from the…

Descriptors: Equations (Mathematics), Prediction, Reliability, Scores

Defending a State Graduation Test: "GI Forum v. Texas Education Agency." Measurement Perspectives from an External Evaluator.

Peer reviewed

Mehrens, William A. – Applied Measurement in Education, 2000

Presents conclusions of an independent measurement expert that the Texas Assessment of Academic Skills (TAAS) was constructed according to acceptable professional standards and tests curricular material considered by the Texas Board of Education important for graduates to have mastered. Also supports the validity and reliability of the TAAS and…

Descriptors: Curriculum, Psychometrics, Reliability, Standards

The Precision of Measurements.

Peer reviewed

Kane, Michael – Applied Measurement in Education, 1996

This overview of the role of error and tolerance for error in measurement asserts that the generic precision associated with a measurement procedure is defined as the root mean square error, or standard error, in some relevant population. This view of precision is explored in several applications of measurement. (SLD)

Descriptors: Error of Measurement, Error Patterns, Generalizability Theory, Measurement Techniques

Response Time Effort: A New Measure of Examinee Motivation in Computer-Based Tests

Peer reviewed

Direct link

Wise, Steven L.; Kong, Xiaojing – Applied Measurement in Education, 2005

When low-stakes assessments are administered, the degree to which examinees give their best effort is often unclear, complicating the validity and interpretation of the resulting test scores. This study introduces a new method, based on item response time, for measuring examinee test-taking effort on computer-based test items. This measure, termed…

Descriptors: Psychometrics, Validity, Reaction Time, Test Items

Previous Page | Next Page »

Pages: 1 | 2

Kane, Michael	2
Wise, Steven L.	2
Bosker, Roel J.	1
Bush, M. Joan	1
Calfee, Robert	1
Case, Susan M.	1
Clauser, Brian E.	1
Clauser, Jerome C.	1
Deunk, Marjolein I.	1
Feldt, Leonard S.	1
Ferrara, Steve	1
Frary, Robert B.	1
Hambleton, Ronald K.	1
Kong, Xiaojing	1
Lane, Suzanne	1
Leighton, Jacqueline P.	1
Mehrens, William A.	1
Osborn Popp, Sharon E.	1
Ryan, Joseph M.	1
Schmidgall, Jonathan	1
Shapley, Kelly S.	1
Shavelson, Richard J.	1
Steedle, Jeffrey T.	1
Stone, Clement A.	1
Thompson, Marilyn S.	1
More ▼