ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	3

Source

Applied Measurement in…

Author

Chis, Liliana	1
Clauser, Brian E.	1
Demars, Christine E.	1
Feldt, Leonard S.	1
Green, Donald Ross	1
Harik, Polina	1
Lunz, Mary E.	1
Margolis, Melissa J.	1
McManus, I. C.	1
Meyers, Jason L.	1
Miller, G. Edward	1
Mollon, Jennifer	1
Way, Walter D.	1
Williams, Simon	1
More ▼

Publication Type

Journal Articles	6
Reports - Evaluative	6

Education Level

Audience

Location

United Kingdom

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 6 results Save | Export

An Analytic Comparison of Effect Sizes for Differential Item Functioning

Peer reviewed

Direct link

Demars, Christine E. – Applied Measurement in Education, 2011

Three types of effects sizes for DIF are described in this exposition: log of the odds-ratio (differences in log-odds), differences in probability-correct, and proportion of variance accounted for. Using these indices involves conceptualizing the degree of DIF in different ways. This integrative review discusses how these measures are impacted in…

Descriptors: Effect Size, Test Bias, Probability, Difficulty Level

Item Position and Item Difficulty Change in an IRT-Based Common Item Equating Design

Peer reviewed

Direct link

Meyers, Jason L.; Miller, G. Edward; Way, Walter D. – Applied Measurement in Education, 2009

In operational testing programs using item response theory (IRT), item parameter invariance is threatened when an item appears in a different location on the live test than it did when it was field tested. This study utilizes data from a large state's assessments to model change in Rasch item difficulty (RID) as a function of item position change,…

Descriptors: Test Items, Test Content, Testing Programs, Simulation

An Empirical Examination of the Impact of Group Discussion and Examinee Performance Information on Judgments Made in the Angoff Standard-Setting Procedure

Peer reviewed

Direct link

Clauser, Brian E.; Harik, Polina; Margolis, Melissa J.; McManus, I. C.; Mollon, Jennifer; Chis, Liliana; Williams, Simon – Applied Measurement in Education, 2009

Numerous studies have compared the Angoff standard-setting procedure to other standard-setting methods, but relatively few studies have evaluated the procedure based on internal criteria. This study uses a generalizability theory framework to evaluate the stability of the estimated cut score. To provide a measure of internal consistency, this…

Descriptors: Generalizability Theory, Group Discussion, Standard Setting (Scoring), Scoring

Measuring the Impact of Judge Severity on Examination Scores.

Peer reviewed

Lunz, Mary E.; And Others – Applied Measurement in Education, 1990

An extension of the Rasch model is used to obtain objective measurements for examinations graded by judges. The model calibrates elements of each facet of the examination on a common log-linear scale. Real examination data illustrate the way correcting for judge severity improves fairness of examinee measures. (SLD)

Descriptors: Certification, Difficulty Level, Interrater Reliability, Judges

Experiences in the Application of Item Response Theory in Test Construction.

Peer reviewed

Green, Donald Ross; And Others – Applied Measurement in Education, 1989

Potential benefits of using item response theory in test construction are evaluated using the experience and evidence accumulated during nine years of using a three-parameter model in the development of major achievement batteries. Topics addressed include error of measurement, test equating, item bias, and item difficulty. (TJH)

Descriptors: Achievement Tests, Computer Assisted Testing, Difficulty Level, Equated Scores

The Relationship between the Distribution of Item Difficulties and Test Reliability.

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 1993

The recommendation that the reliability of multiple-choice tests will be enhanced if the distribution of item difficulties is concentrated at approximately 0.50 is reinforced and extended in this article by viewing the 0/1 item scoring as a dichotomization of an underlying normally distributed ability score. (SLD)

Descriptors: Ability, Difficulty Level, Guessing (Tests), Mathematical Models

Difficulty Level	6
Test Items	4
Scoring	3
Test Construction	3
Probability	2
Test Bias	2
Ability	1
Achievement Tests	1
Certification	1
Comparative Analysis	1
Computation	1
Computer Assisted Testing	1
Credentials	1
Cutting Scores	1
Data Analysis	1
Effect Size	1
Equated Scores	1
Error of Measurement	1
Foreign Countries	1
Generalizability Theory	1
Group Discussion	1
Guessing (Tests)	1
Interrater Reliability	1
Item Response Theory	1
Judges	1
More ▼