Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 12 |
Descriptor
Evaluation Methods | 19 |
Test Items | 19 |
Item Response Theory | 12 |
Simulation | 8 |
Measurement Techniques | 6 |
Test Bias | 6 |
Comparative Analysis | 5 |
Error Patterns | 3 |
Monte Carlo Methods | 3 |
Research Methodology | 3 |
Scores | 3 |
More ▼ |
Source
Applied Measurement in… | 19 |
Author
Penfield, Randall D. | 2 |
Su, Ya-Hui | 2 |
Wang, Wen-Chung | 2 |
Ackerman, Terry A. | 1 |
Ban, Jae-Chun | 1 |
Berberoglu, Giray | 1 |
Bolt, Daniel M. | 1 |
Boughton, Keith A. | 1 |
Brian F. French | 1 |
Chen, Yu-Jen | 1 |
Cheng, Chien-Fen | 1 |
More ▼ |
Publication Type
Journal Articles | 19 |
Reports - Research | 12 |
Reports - Evaluative | 7 |
Information Analyses | 1 |
Education Level
High Schools | 2 |
Elementary Secondary Education | 1 |
Grade 11 | 1 |
Secondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
ACT Assessment | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Traditional vs Intersectional DIF Analysis: Considerations and a Comparison Using State Testing Data
Tony Albano; Brian F. French; Thao Thu Vo – Applied Measurement in Education, 2024
Recent research has demonstrated an intersectional approach to the study of differential item functioning (DIF). This approach expands DIF to account for the interactions between what have traditionally been treated as separate grouping variables. In this paper, we compare traditional and intersectional DIF analyses using data from a state testing…
Descriptors: Test Items, Item Analysis, Data Use, Standardized Tests
Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020
Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…
Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores
George, Ann Cathrice; Robitzsch, Alexander – Applied Measurement in Education, 2018
This article presents a new perspective on measuring gender differences in the large-scale assessment study Trends in International Science Study (TIMSS). The suggested empirical model is directly based on the theoretical competence model of the domain mathematics and thus includes the interaction between content and cognitive sub-competencies.…
Descriptors: Achievement Tests, Elementary Secondary Education, Mathematics Achievement, Mathematics Tests
Dadey, Nathan; Lyons, Susan; DePascale, Charles – Applied Measurement in Education, 2018
Evidence of comparability is generally needed whenever there are variations in the conditions of an assessment administration, including variations introduced by the administration of an assessment on multiple digital devices (e.g., tablet, laptop, desktop). This article is meant to provide a comprehensive examination of issues relevant to the…
Descriptors: Evaluation Methods, Computer Assisted Testing, Educational Technology, Technology Uses in Education
Lee, Won-Chan; Ban, Jae-Chun – Applied Measurement in Education, 2010
Various applications of item response theory often require linking to achieve a common scale for item parameter estimates obtained from different groups. This article used a simulation to examine the relative performance of four different item response theory (IRT) linking procedures in a random groups equating design: concurrent calibration with…
Descriptors: Item Response Theory, Simulation, Comparative Analysis, Measurement Techniques
Finch, Holmes; Stage, Alan Kirk; Monahan, Patrick – Applied Measurement in Education, 2008
A primary assumption underlying several of the common methods for modeling item response data is unidimensionality, that is, test items tap into only one latent trait. This assumption can be assessed several ways, using nonlinear factor analysis and DETECT, a method based on the item conditional covariances. When multidimensionality is identified,…
Descriptors: Test Items, Factor Analysis, Item Response Theory, Comparative Analysis
Puhan, Gautam – Applied Measurement in Education, 2009
The purpose of this study is to determine the extent of scale drift on a test that employs cut scores. It was essential to examine scale drift for this testing program because new forms in this testing program are often put on scale through a series of intermediate equatings (known as equating chains). This process may cause equating error to…
Descriptors: Testing Programs, Testing, Measurement Techniques, Item Response Theory
Penfield, Randall D. – Applied Measurement in Education, 2007
A widely used approach for categorizing the level of differential item functioning (DIF) in dichotomous items is the scheme proposed by Educational Testing Service (ETS) based on a transformation of the Mantel-Haeszel common odds ratio. In this article two classification schemes for DIF in polytomous items (referred to as the P1 and P2 schemes)…
Descriptors: Simulation, Educational Testing, Test Bias, Evaluation Methods
Sun, Koun-Tem; Chen, Yu-Jen; Tsai, Shu-Yen; Cheng, Chien-Fen – Applied Measurement in Education, 2008
In educational measurement, the construction of parallel test forms is often a combinatorial optimization problem that involves the time-consuming selection of items to construct tests having approximately the same test information functions (TIFs) and constraints. This article proposes a novel method, genetic algorithm (GA), to construct parallel…
Descriptors: Test Format, Measurement Techniques, Equations (Mathematics), Item Response Theory
Wells, Craig S.; Bolt, Daniel M. – Applied Measurement in Education, 2008
Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…
Descriptors: Test Length, Test Items, Monte Carlo Methods, Nonparametric Statistics
D'Agostino, Jerome V.; Welsh, Megan E.; Cimetta, Adriana D.; Falco, Lia D.; Smith, Shannon; VanWinkle, Waverely Hester; Powers, Sonya J. – Applied Measurement in Education, 2008
Central to the standards-based assessment validation process is an examination of the alignment between state standards and test items. Several alignment analysis systems have emerged recently, but most rely on either traditional rating or matching techniques. Little, if any, analyses have been reported on the degree of consistency between the two…
Descriptors: Test Items, Student Evaluation, State Standards, Evaluation Methods
Gierl, Mark J.; Gotzmann, Andrea; Boughton, Keith A. – Applied Measurement in Education, 2004
Differential item functioning (DIF) analyses are used to identify items that operate differently between two groups, after controlling for ability. The Simultaneous Item Bias Test (SIBTEST) is a popular DIF detection method that matches examinees on a true score estimate of ability. However in some testing situations, like test translation and…
Descriptors: True Scores, Simulation, Test Bias, Student Evaluation
Penfield, Randall D. – Applied Measurement in Education, 2006
This study applied the maximum expected information (MEI) and the maximum posterior-weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability…
Descriptors: Bayesian Statistics, Adaptive Testing, Computer Assisted Testing, Test Items
Paek, Insu; Young, Michael J. – Applied Measurement in Education, 2005
When the item response theory (IRT) model uses the marginal maximum likelihood estimation, person parameters are usually treated as random parameters following a certain distribution as a prior distribution to estimate the structural parameters in the model. For example, both PARSCALE (Muraki & Bock, 1999) and BILOG 3 (Mislevy & Bock,…
Descriptors: Item Response Theory, Test Items, Maximum Likelihood Statistics, Test Bias
Wang, Wen-Chung; Su, Ya-Hui – Applied Measurement in Education, 2004
In this study we investigated the effects of the average signed area (ASA) between the item characteristic curves of the reference and focal groups and three test purification procedures on the uniform differential item functioning (DIF) detection via the Mantel-Haenszel (M-H) method through Monte Carlo simulations. The results showed that ASA,…
Descriptors: Test Bias, Student Evaluation, Evaluation Methods, Test Items
Previous Page | Next Page »
Pages: 1 | 2