ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	6

Descriptor

Weighted Scores	9
Correlation	3
Error of Measurement	3
Item Response Theory	3
Scoring	3
Equated Scores	2
Evaluation Methods	2
Item Analysis	2
Licensing Examinations…	2
Probability	2
Reliability	2
Sample Size	2
Sampling	2
Scoring Rubrics	2
Student Evaluation	2
Test Items	2
Test Reliability	2
Validity	2
Academic Ability	1
Academic Standards	1
Achievement Rating	1
Achievement Tests	1
Certification	1
Criterion Referenced Tests	1
Decision Making	1
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	9
Reports - Research	7
Reports - Evaluative	2

Education Level

Higher Education	1
Postsecondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

United States Medical…

What Works Clearinghouse Rating

Showing all 9 results Save | Export

Change in Engagement during Test Events: An Argument for Weighted Scoring?

Peer reviewed

Direct link

Steven L. Wise; G. Gage Kingsbury; Meredith L. Langi – Applied Measurement in Education, 2023

Recent research has provided evidence that performance change during a student's test event can indicate the presence of test-taking disengagement. Meaningful performance change implies that some portions of the test event reflect assumed maximum performance better than others and, because disengagement tends to diminish performance,…

Descriptors: Tests, Weighted Scores, Test Wiseness, Scoring

Equating with Small and Unbalanced Samples

Peer reviewed

Direct link

Goodman, Joshua T.; Dallas, Andrew D.; Fan, Fen – Applied Measurement in Education, 2020

Recent research has suggested that re-setting the standard for each administration of a small sample examination, in addition to the high cost, does not adequately maintain similar performance expectations year after year. Small-sample equating methods have shown promise with samples between 20 and 30. For groups that have fewer than 20 students,…

Descriptors: Equated Scores, Sample Size, Sampling, Weighted Scores

Increasing the Validity of Angoff Standards through Analysis of Judge-Level Internal Consistency

Peer reviewed

Direct link

Clauser, Jerome C.; Clauser, Brian E.; Hambleton, Ronald K. – Applied Measurement in Education, 2014

The purpose of the present study was to extend past work with the Angoff method for setting standards by examining judgments at the judge level rather than the panel level. The focus was on investigating the relationship between observed Angoff standard setting judgments and empirical conditional probabilities. This relationship has been used as a…

Descriptors: Standard Setting (Scoring), Validity, Reliability, Correlation

Validating Automated Essay Scoring: A (Modest) Refinement of the "Gold Standard"

Peer reviewed

Direct link

Powers, Donald E.; Escoffery, David S.; Duchnowski, Matthew P. – Applied Measurement in Education, 2015

By far, the most frequently used method of validating (the interpretation and use of) automated essay scores has been to compare them with scores awarded by human raters. Although this practice is questionable, human-machine agreement is still often regarded as the "gold standard." Our objective was to refine this model and apply it to…

Descriptors: Essays, Test Scoring Machines, Program Validation, Criterion Referenced Tests

The Effect of Changing Content on IRT Scaling Methods

Peer reviewed

Direct link

Keller, Lisa A.; Keller, Robert R. – Applied Measurement in Education, 2015

Equating test forms is an essential activity in standardized testing, with increased importance with the accountability systems in existence through the mandate of Adequate Yearly Progress. It is through equating that scores from different test forms become comparable, which allows for the tracking of changes in the performance of students from…

Descriptors: Item Response Theory, Rating Scales, Standardized Tests, Scoring Rubrics

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

The Reliability and Validity of Weighted Composite Scores

Peer reviewed

Direct link

Kane, Michael; Case, Susan M. – Applied Measurement in Education, 2004

The scores on 2 distinct tests (e.g., essay and objective) are often combined to create a composite score, which is used to make decisions. The validity of the observed composite can sometimes be evaluated relative to an external criterion. However, in cases where no criterion is available, the observed composite has generally been evaluated in…

Descriptors: Validity, Weighted Scores, Reliability, Student Evaluation

Weighting Constructed-Response Items in IRT-Based Exams

Peer reviewed

Direct link

Sykes, Robert C.; Hou, Liling – Applied Measurement in Education, 2003

Weighting responses to Constructed-Response (CR) items has been proposed as a way to increase the contribution these items make to the test score when there is insufficient testing time to administer additional CR items. The effect of various types of weighting items of an IRT-based mixed-format writing examination was investigated.…

Descriptors: Item Response Theory, Weighted Scores, Responses, Scores

Effects of Empirical Option Weighting on Estimating Domain Scores and Making Pass/Fail Decisions.

Peer reviewed

Haladyna, Thomas M. – Applied Measurement in Education, 1990

Number-correct scoring was compared to empirical option weighting for estimating domain scores and making pass/fail decisions that are typical of certification, licensing, competency, and proficiency testing. The usefulness of the option-total correlation option-weighting method was illustrated with a sample of 1,000 high school students. (SLD)

Descriptors: Achievement Tests, Certification, Correlation, Decision Making

Case, Susan M.	1
Clauser, Brian E.	1
Clauser, Jerome C.	1
Dallas, Andrew D.	1
Duchnowski, Matthew P.	1
Escoffery, David S.	1
Fan, Fen	1
G. Gage Kingsbury	1
Goodman, Joshua T.	1
Haladyna, Thomas M.	1
Hambleton, Ronald K.	1
Hou, Liling	1
Kane, Michael	1
Keller, Lisa A.	1
Keller, Robert R.	1
Meredith L. Langi	1
Phillips, Gary W.	1
Powers, Donald E.	1
Steven L. Wise	1
Sykes, Robert C.	1
More ▼