ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	11

Descriptor

Evaluation Methods	12
Item Response Theory	12
Testing Programs	12
Equated Scores	5
Item Analysis	4
Psychometrics	4
Standardized Tests	4
Test Items	4
College Entrance Examinations	3
Cutting Scores	3
Mathematics Tests	3
Measurement Techniques	3
Sampling	3
Scoring	3
Achievement Tests	2
Evaluation Research	2
Gender Differences	2
Law Schools	2
Mathematics Achievement	2
Measures (Individuals)	2
Multiple Choice Tests	2
Racial Differences	2
Science Achievement	2
Scores	2
Student Evaluation	2
More ▼

Source

Applied Measurement in…	3
Applied Psychological…	3
ACT, Inc.	1
Anatomical Sciences Education	1
Educational and Psychological…	1
Journal of Applied Testing…	1
Journal of Educational…	1
Online Submission	1

Publication Type

Journal Articles	10
Reports - Research	6
Reports - Evaluative	3
Information Analyses	2
Numerical/Quantitative Data	1
Opinion Papers	1
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Higher Education	5
Elementary Secondary Education	2
Grade 4	2
Grade 6	2
Grade 8	2
Postsecondary Education	2
Adult Education	1
Grade 10	1
Grade 11	1
Grade 3	1
Grade 5	1
Grade 7	1
Grade 9	1
High Schools	1
Secondary Education	1
More ▼

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Traditional vs Intersectional DIF Analysis: Considerations and a Comparison Using State Testing Data

Peer reviewed

Direct link

Tony Albano; Brian F. French; Thao Thu Vo – Applied Measurement in Education, 2024

Recent research has demonstrated an intersectional approach to the study of differential item functioning (DIF). This approach expands DIF to account for the interactions between what have traditionally been treated as separate grouping variables. In this paper, we compare traditional and intersectional DIF analyses using data from a state testing…

Descriptors: Test Items, Item Analysis, Data Use, Standardized Tests

Using Rasch Measurement to Score, Evaluate, and Improve Examinations in an Anatomy Course

Peer reviewed

Direct link

Royal, Kenneth D.; Gilliland, Kurt O.; Kernick, Edward T. – Anatomical Sciences Education, 2014

Any examination that involves moderate to high stakes implications for examinees should be psychometrically sound and legally defensible. Currently, there are two broad and competing families of test theories that are used to score examination data. The majority of instructors outside the high-stakes testing arena rely on classical test theory…

Descriptors: Item Response Theory, Scoring, Evaluation Methods, Anatomy

Multilevel Modeling of Item Position Effects

Peer reviewed

Direct link

Albano, Anthony D. – Journal of Educational Measurement, 2013

In many testing programs it is assumed that the context or position in which an item is administered does not have a differential effect on examinee responses to the item. Violations of this assumption may bias item response theory estimates of item and person parameters. This study examines the potentially biasing effects of item position. A…

Descriptors: Test Items, Item Response Theory, Test Format, Questioning Techniques

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

Evaluating the Effects of Differences in Group Abilities on the Tucker and the Levine Observed-Score Methods for Common-Item Nonequivalent Groups Equating. ACT Research Report Series 2010-1

Download full text

Chen, Hanwei; Cui, Zhongmin; Zhu, Rongchun; Gao, Xiaohong – ACT, Inc., 2010

The most critical feature of a common-item nonequivalent groups equating design is that the average score difference between the new and old groups can be accurately decomposed into a group ability difference and a form difficulty difference. Two widely used observed-score linear equating methods, the Tucker and the Levine observed-score methods,…

Descriptors: Equated Scores, Groups, Ability Grouping, Difficulty Level

Detecting and Correcting Scale Drift in Test Equating: An Illustration from a Large Scale Testing Program

Peer reviewed

Direct link

Puhan, Gautam – Applied Measurement in Education, 2009

The purpose of this study is to determine the extent of scale drift on a test that employs cut scores. It was essential to examine scale drift for this testing program because new forms in this testing program are often put on scale through a series of intermediate equatings (known as equating chains). This process may cause equating error to…

Descriptors: Testing Programs, Testing, Measurement Techniques, Item Response Theory

Construct Equivalence across Grades in a Vertical Scale for a K-12 Large-Scale Reading Assessment

Peer reviewed

Direct link

Wang, Shudong; Jiao, Hong – Educational and Psychological Measurement, 2009

In practice, vertical scales have been continually used to measure students' achievement progress across several grade levels and have been considered very challenging psychometric procedures. Recently, such practices have been drawing many criticisms. The major criticisms focus on dimensionality and construct equivalence of the latent trait or…

Descriptors: Reading Comprehension, Elementary Secondary Education, Measures (Individuals), Psychometrics

Invariance of Score Linkings across Gender Groups for Forms of a Testlet-Based College-Level Examination Program Examination

Peer reviewed

Direct link

Yang, Wen-Ling; Gao, Rui – Applied Psychological Measurement, 2008

This study investigates whether the functions linking number-correct scores to the College-Level Examination Program (CLEP) scaled scores remain invariant over gender groups, using test data on the 16 testlet-based forms of the CLEP College Algebra exam. To be consistent with the operational practice, linking of various test forms to a common…

Descriptors: Mathematics Tests, Algebra, Item Response Theory, Testing Programs

Matching the Judgmental Task with Standard Setting Panelist Expertise: The Item-Descriptor (ID) Matching Method

Peer reviewed

Direct link

Ferrara, Steve; Perie, Marianne; Johnson, Eugene – Journal of Applied Testing Technology, 2008

Psychometricians continue to introduce new approaches to setting cut scores for educational assessments in an attempt to improve on current methods. In this paper we describe the Item-Descriptor (ID) Matching method, a method based on IRT item mapping. In ID Matching, test content area experts match items (i.e., their judgments about the knowledge…

Descriptors: Test Results, Test Content, Testing Programs, Educational Testing

A Discussion of Population Invariance

Peer reviewed

Direct link

Brennan, Robert L. – Applied Psychological Measurement, 2008

The discussion here covers five articles that are linked in the sense that they all treat population invariance. This discussion of population invariance is a somewhat broader treatment of the subject than simply a discussion of these five articles. In particular, occasional reference is made to publications other than those in this issue. The…

Descriptors: Advanced Placement, Law Schools, Science Achievement, Achievement Tests

A Discussion of Population Invariance of Equating

Peer reviewed

Direct link

Petersen, Nancy S. – Applied Psychological Measurement, 2008

This article discusses the five studies included in this issue. Each article addressed the same topic, population invariance of equating. They all used data from major standardized testing programs, and they all used essentially the same statistics to evaluate their results, namely, the root mean square difference and root expected mean square…

Descriptors: Testing Programs, Standardized Tests, Equated Scores, Evaluation Methods

The Effect of Year-to-Year Rater Variation on IRT Linking

Download full text

Yen, Shu Jing; Ochieng, Charles; Michaels, Hillary; Friedman, Greg – Online Submission, 2005

Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results using an item response theory (IRT) methodology…

Descriptors: Test Items, Writing Tests, Reading Tests, Measures (Individuals)

Albano, Anthony D.	1
Brennan, Robert L.	1
Brian F. French	1
Chen, Hanwei	1
Cui, Zhongmin	1
Ferrara, Steve	1
Friedman, Greg	1
Gao, Rui	1
Gao, Xiaohong	1
Gilliland, Kurt O.	1
Jiao, Hong	1
Johnson, Eugene	1
Kernick, Edward T.	1
Michaels, Hillary	1
Ochieng, Charles	1
Perie, Marianne	1
Petersen, Nancy S.	1
Phillips, Gary W.	1
Puhan, Gautam	1
Royal, Kenneth D.	1
Thao Thu Vo	1
Tony Albano	1
Wang, Shudong	1
Yang, Wen-Ling	1
Yen, Shu Jing	1
More ▼