ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	1

Source

Applied Measurement in…

Author

Alvarez, Karina	1
Berk, Ronald A.	1
Feldt, Leonard S.	1
Fisher, Steve	1
Holland, Paul W.	1
Johnson, Robert L.	1
Kuhs, Therese	1
Lee, Okhee	1
Penfield, Randall D.	1
Penny, Jim	1
Plake, Barbara S.	1
Wainer, Howard	1
Williams, Valerie S. L.	1
More ▼

Publication Type

Journal Articles	7
Reports - Evaluative	5
Information Analyses	3
Reports - Research	1

Education Level

Elementary Education	1
Grade 4	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)

What Works Clearinghouse Rating

Showing all 7 results Save | Export

Using a Taxonomy of Differential Step Functioning to Improve the Interpretation of DIF in Polytomous Items: An Illustration

Peer reviewed

Direct link

Penfield, Randall D.; Alvarez, Karina; Lee, Okhee – Applied Measurement in Education, 2009

The assessment of differential item functioning (DIF) in polytomous items addresses between-group differences in measurement properties at the item level, but typically does not inform which score levels may be involved in the DIF effect. The framework of differential step functioning (DSF) addresses this issue by examining between-group…

Descriptors: Test Bias, Classification, Test Items, Criteria

Can Validity Rise When Reliability Declines?

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 1997

It has often been asserted that the reliability of a measure places an upper limit on its validity. This article demonstrates in theory that validity can rise when reliability declines, even when validity evidence is a correlation with an acceptable criterion. Whether empirical examples can actually be found is an open question. (SLD)

Descriptors: Correlation, Criteria, Reliability, Test Construction

Edward and Cummings'"Fuzzy Truncation Model" Is a Step in the Right Direction.

Peer reviewed

Holland, Paul W.; Wainer, Howard – Applied Measurement in Education, 1990

The attempt by D.Edwards and C. B. Cummings to adjust state mean Scholastic Aptitude Test Scores for differential participation rates with a "fuzzy truncation model" satisfies three criteria the authors previously defined but falls short for two. Omission of sensitivity studies mars the otherwise exemplary study. (SLD)

Descriptors: College Entrance Examinations, Criteria, Higher Education, Participation

Score Resolution: An Investigation of the Reliability and Validity of Resolved Scores

Peer reviewed

Direct link

Johnson, Robert L.; Penny, Jim; Fisher, Steve; Kuhs, Therese – Applied Measurement in Education, 2003

When raters assign different scores to a performance task, a method for resolving rating differences is required to report a single score to the examinee. Recent studies indicate that decisions about examinees, such as pass/fail decisions, differ across resolution methods. Previous studies also investigated the interrater reliability of…

Descriptors: Test Reliability, Test Validity, Scores, Interrater Reliability

Something Old, Something New, Something Borrowed, a Lot to Do.

Peer reviewed

Berk, Ronald A. – Applied Measurement in Education, 1995

A brief summary of standard setting knowledge is presented, derived from about 20 methods that utilize a judgmental review process, the approach most relevant to the standard-setting strategies proposed in this special issue. Criteria for judging effectiveness and critiques of the methods discussed in the issue are offered. (SLD)

Descriptors: Criteria, Decision Making, Educational History, Evaluation Methods

The "Unbiased" Anchor: Bridging the Gap between DIF and Item Bias.

Peer reviewed

Williams, Valerie S. L. – Applied Measurement in Education, 1997

Using item response theory to investigate differential item functioning (DIF), students' expected course grades were examined and found to function similarly across sex and race. These grades were incorporated into the matching criterion, enhancing the validity of subgroup comparisons for the third-grade mathematics test taken by 1,050 students.…

Descriptors: Comparative Analysis, Criteria, Elementary School Students, Grade 3

The Performance Domain and the Structure of the Decision Space.

Peer reviewed

Plake, Barbara S. – Applied Measurement in Education, 1995

This article provides a framework for the rest of the articles in this special issue comparing the utility of three standard-setting methods with complex performance assessments. The context of the standard setting study is described, and the methods are outlined. (SLD)

Descriptors: Comparative Analysis, Criteria, Decision Making, Educational Assessment

Criteria	7
Evaluation Methods	3
Comparative Analysis	2
Decision Making	2
Performance Based Assessment	2
Standard Setting (Scoring)	2
Standards	2
Statistical Analysis	2
Test Items	2
Validity	2
Classification	1
College Entrance Examinations	1
Correlation	1
Educational Assessment	1
Educational History	1
Elementary School Students	1
Elementary Secondary Education	1
Evaluators	1
Grade 3	1
Grade 4	1
Grades (Scholastic)	1
Guidelines	1
Higher Education	1
Interrater Reliability	1
Item Bias	1
More ▼