ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	12

Descriptor

Evaluation Methods	19
Test Items	19
Item Response Theory	12
Simulation	8
Measurement Techniques	6
Test Bias	6
Comparative Analysis	5
Error Patterns	3
Monte Carlo Methods	3
Research Methodology	3
Scores	3
Student Evaluation	3
Classification	2
College Students	2
Computation	2
Computer Assisted Testing	2
Cutting Scores	2
Educational Testing	2
Equated Scores	2
High School Students	2
Item Analysis	2
Mathematics Tests	2
Models	2
Psychometrics	2
Sample Size	2
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	19
Reports - Research	12
Reports - Evaluative	7
Information Analyses	1

Education Level

High Schools	2
Elementary Secondary Education	1
Grade 11	1
Secondary Education	1

Audience

Location

Arizona	1
Germany	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

Traditional vs Intersectional DIF Analysis: Considerations and a Comparison Using State Testing Data

Peer reviewed

Direct link

Tony Albano; Brian F. French; Thao Thu Vo – Applied Measurement in Education, 2024

Recent research has demonstrated an intersectional approach to the study of differential item functioning (DIF). This approach expands DIF to account for the interactions between what have traditionally been treated as separate grouping variables. In this paper, we compare traditional and intersectional DIF analyses using data from a state testing…

Descriptors: Test Items, Item Analysis, Data Use, Standardized Tests

Investigating the Classification Accuracy of Rasch and Nominal Weights Mean Equating with Very Small Samples

Peer reviewed

Direct link

Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020

Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…

Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores

Focusing on Interactions between Content and Cognition: A New Perspective on Gender Differences in Mathematical Sub-Competencies

Peer reviewed

Direct link

George, Ann Cathrice; Robitzsch, Alexander – Applied Measurement in Education, 2018

This article presents a new perspective on measuring gender differences in the large-scale assessment study Trends in International Science Study (TIMSS). The suggested empirical model is directly based on the theoretical competence model of the domain mathematics and thus includes the interaction between content and cognitive sub-competencies.…

Descriptors: Achievement Tests, Elementary Secondary Education, Mathematics Achievement, Mathematics Tests

The Comparability of Scores from Different Digital Devices: A Literature Review and Synthesis with Recommendations for Practice

Peer reviewed

Direct link

Dadey, Nathan; Lyons, Susan; DePascale, Charles – Applied Measurement in Education, 2018

Evidence of comparability is generally needed whenever there are variations in the conditions of an assessment administration, including variations introduced by the administration of an assessment on multiple digital devices (e.g., tablet, laptop, desktop). This article is meant to provide a comprehensive examination of issues relevant to the…

Descriptors: Evaluation Methods, Computer Assisted Testing, Educational Technology, Technology Uses in Education

A Comparison of IRT Linking Procedures

Peer reviewed

Direct link

Lee, Won-Chan; Ban, Jae-Chun – Applied Measurement in Education, 2010

Various applications of item response theory often require linking to achieve a common scale for item parameter estimates obtained from different groups. This article used a simulation to examine the relative performance of four different item response theory (IRT) linking procedures in a random groups equating design: concurrent calibration with…

Descriptors: Item Response Theory, Simulation, Comparative Analysis, Measurement Techniques

Comparison of Factor Simplicity Indices for Dichotomous Data: DETECT R, Bentler's Simplicity Index, and the Loading Simplicity Index

Peer reviewed

Direct link

Finch, Holmes; Stage, Alan Kirk; Monahan, Patrick – Applied Measurement in Education, 2008

A primary assumption underlying several of the common methods for modeling item response data is unidimensionality, that is, test items tap into only one latent trait. This assumption can be assessed several ways, using nonlinear factor analysis and DETECT, a method based on the item conditional covariances. When multidimensionality is identified,…

Descriptors: Test Items, Factor Analysis, Item Response Theory, Comparative Analysis

Detecting and Correcting Scale Drift in Test Equating: An Illustration from a Large Scale Testing Program

Peer reviewed

Direct link

Puhan, Gautam – Applied Measurement in Education, 2009

The purpose of this study is to determine the extent of scale drift on a test that employs cut scores. It was essential to examine scale drift for this testing program because new forms in this testing program are often put on scale through a series of intermediate equatings (known as equating chains). This process may cause equating error to…

Descriptors: Testing Programs, Testing, Measurement Techniques, Item Response Theory

An Approach for Categorizing DIF in Polytomous Items

Peer reviewed

Direct link

Penfield, Randall D. – Applied Measurement in Education, 2007

A widely used approach for categorizing the level of differential item functioning (DIF) in dichotomous items is the scheme proposed by Educational Testing Service (ETS) based on a transformation of the Mantel-Haeszel common odds ratio. In this article two classification schemes for DIF in polytomous items (referred to as the P1 and P2 schemes)…

Descriptors: Simulation, Educational Testing, Test Bias, Evaluation Methods

Creating IRT-Based Parallel Test Forms Using the Genetic Algorithm Method

Peer reviewed

Direct link

Sun, Koun-Tem; Chen, Yu-Jen; Tsai, Shu-Yen; Cheng, Chien-Fen – Applied Measurement in Education, 2008

In educational measurement, the construction of parallel test forms is often a combinatorial optimization problem that involves the time-consuming selection of items to construct tests having approximately the same test information functions (TIFs) and constraints. This article proposes a novel method, genetic algorithm (GA), to construct parallel…

Descriptors: Test Format, Measurement Techniques, Equations (Mathematics), Item Response Theory

Investigation of a Nonparametric Procedure for Assessing Goodness-of-Fit in Item Response Theory

Peer reviewed

Direct link

Wells, Craig S.; Bolt, Daniel M. – Applied Measurement in Education, 2008

Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…

Descriptors: Test Length, Test Items, Monte Carlo Methods, Nonparametric Statistics

The Rating and Matching Item-Objective Alignment Methods

Peer reviewed

Direct link

D'Agostino, Jerome V.; Welsh, Megan E.; Cimetta, Adriana D.; Falco, Lia D.; Smith, Shannon; VanWinkle, Waverely Hester; Powers, Sonya J. – Applied Measurement in Education, 2008

Central to the standards-based assessment validation process is an examination of the alignment between state standards and test items. Several alignment analysis systems have emerged recently, but most rely on either traditional rating or matching techniques. Little, if any, analyses have been reported on the degree of consistency between the two…

Descriptors: Test Items, Student Evaluation, State Standards, Evaluation Methods

Performance of SIBTEST When the Percentage of DIF Items Is Large

Peer reviewed

Direct link

Gierl, Mark J.; Gotzmann, Andrea; Boughton, Keith A. – Applied Measurement in Education, 2004

Differential item functioning (DIF) analyses are used to identify items that operate differently between two groups, after controlling for ability. The Simultaneous Item Bias Test (SIBTEST) is a popular DIF detection method that matches examinees on a true score estimate of ability. However in some testing situations, like test translation and…

Descriptors: True Scores, Simulation, Test Bias, Student Evaluation

Applying Bayesian Item Selection Approaches to Adaptive Tests Using Polytomous Items

Peer reviewed

Direct link

Penfield, Randall D. – Applied Measurement in Education, 2006

This study applied the maximum expected information (MEI) and the maximum posterior-weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability…

Descriptors: Bayesian Statistics, Adaptive Testing, Computer Assisted Testing, Test Items

Investigation of Student Growth Recovery in a Fixed-Item Linking Procedure with a Fixed-Person Prior Distribution for Mixed-Format Test Data

Peer reviewed

Direct link

Paek, Insu; Young, Michael J. – Applied Measurement in Education, 2005

When the item response theory (IRT) model uses the marginal maximum likelihood estimation, person parameters are usually treated as random parameters following a certain distribution as a prior distribution to estimate the structural parameters in the model. For example, both PARSCALE (Muraki & Bock, 1999) and BILOG 3 (Mislevy & Bock,…

Descriptors: Item Response Theory, Test Items, Maximum Likelihood Statistics, Test Bias

Effects of Average Signed Area Between Two Item Characteristic Curves and Test Purification Procedures on the DIF Detection via the Mantel-Haenszel Method

Peer reviewed

Direct link

Wang, Wen-Chung; Su, Ya-Hui – Applied Measurement in Education, 2004

In this study we investigated the effects of the average signed area (ASA) between the item characteristic curves of the reference and focal groups and three test purification procedures on the uniform differential item functioning (DIF) detection via the Mantel-Haenszel (M-H) method through Monte Carlo simulations. The results showed that ASA,…

Descriptors: Test Bias, Student Evaluation, Evaluation Methods, Test Items

Previous Page | Next Page »

Pages: 1 | 2

Penfield, Randall D.	2
Su, Ya-Hui	2
Wang, Wen-Chung	2
Ackerman, Terry A.	1
Ban, Jae-Chun	1
Berberoglu, Giray	1
Bolt, Daniel M.	1
Boughton, Keith A.	1
Brian F. French	1
Chen, Yu-Jen	1
Cheng, Chien-Fen	1
Cimetta, Adriana D.	1
Clauser, Brian	1
D'Agostino, Jerome V.	1
Dadey, Nathan	1
DePascale, Charles	1
Dwyer, Andrew C.	1
Falco, Lia D.	1
Finch, Holmes	1
Furter, Robert T.	1
George, Ann Cathrice	1
Gierl, Mark J.	1
Gotzmann, Andrea	1
Lee, Won-Chan	1
More ▼