NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Researchers1
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 82 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Weicong Lyu; Chun Wang; Gongjun Xu – Grantee Submission, 2024
Data harmonization is an emerging approach to strategically combining data from multiple independent studies, enabling addressing new research questions that are not answerable by a single contributing study. A fundamental psychometric challenge for data harmonization is to create commensurate measures for the constructs of interest across…
Descriptors: Data Analysis, Test Items, Psychometrics, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Paek, Insu; Lin, Zhongtian; Chalmers, Robert Philip – Educational and Psychological Measurement, 2023
To reduce the chance of Heywood cases or nonconvergence in estimating the 2PL or the 3PL model in the marginal maximum likelihood with the expectation-maximization (MML-EM) estimation method, priors for the item slope parameter in the 2PL model or for the pseudo-guessing parameter in the 3PL model can be used and the marginal maximum a posteriori…
Descriptors: Models, Item Response Theory, Test Items, Intervals
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Inga Laukaityte; Marie Wiberg – Practical Assessment, Research & Evaluation, 2024
The overall aim was to examine effects of differences in group ability and features of the anchor test form on equating bias and the standard error of equating (SEE) using both real and simulated data. Chained kernel equating, Postratification kernel equating, and Circle-arc equating were studied. A college admissions test with four different…
Descriptors: Ability Grouping, Test Items, College Entrance Examinations, High Stakes Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Huang, Qi; Bolt, Daniel M. – Educational and Psychological Measurement, 2023
Previous studies have demonstrated evidence of latent skill continuity even in tests intentionally designed for measurement of binary skills. In addition, the assumption of binary skills when continuity is present has been shown to potentially create a lack of invariance in item and latent ability parameters that may undermine applications. In…
Descriptors: Item Response Theory, Test Items, Skill Development, Robustness (Statistics)
Peer reviewed Peer reviewed
Direct linkDirect link
Cooperman, Allison W.; Weiss, David J.; Wang, Chun – Educational and Psychological Measurement, 2022
Adaptive measurement of change (AMC) is a psychometric method for measuring intra-individual change on one or more latent traits across testing occasions. Three hypothesis tests--a Z test, likelihood ratio test, and score ratio index--have demonstrated desirable statistical properties in this context, including low false positive rates and high…
Descriptors: Error of Measurement, Psychometrics, Hypothesis Testing, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023
This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…
Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Sahin Kursad, Merve; Cokluk Bokeoglu, Omay; Cikrikci, Rahime Nukhet – International Journal of Assessment Tools in Education, 2022
Item parameter drift (IPD) is the systematic differentiation of parameter values of items over time due to various reasons. If it occurs in computer adaptive tests (CAT), it causes errors in the estimation of item and ability parameters. Identification of the underlying conditions of this situation in CAT is important for estimating item and…
Descriptors: Item Analysis, Computer Assisted Testing, Test Items, Error of Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Koziol, Natalie A.; Goodrich, J. Marc; Yoon, HyeonJin – Educational and Psychological Measurement, 2022
Differential item functioning (DIF) is often used to examine validity evidence of alternate form test accommodations. Unfortunately, traditional approaches for evaluating DIF are prone to selection bias. This article proposes a novel DIF framework that capitalizes on regression discontinuity design analysis to control for selection bias. A…
Descriptors: Regression (Statistics), Item Analysis, Validity, Testing Accommodations
Peer reviewed Peer reviewed
Direct linkDirect link
Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022
The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…
Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency
Joshua B. Gilbert; James S. Kim; Luke W. Miratrix – Annenberg Institute for School Reform at Brown University, 2024
Longitudinal models of individual growth typically emphasize between-person predictors of change but ignore how growth may vary "within" persons because each person contributes only one point at each time to the model. In contrast, modeling growth with multi-item assessments allows evaluation of how relative item performance may shift…
Descriptors: Vocabulary Development, Item Response Theory, Test Items, Student Development
Peer reviewed Peer reviewed
Direct linkDirect link
Joshua B. Gilbert; James S. Kim; Luke W. Miratrix – Applied Measurement in Education, 2024
Longitudinal models typically emphasize between-person predictors of change but ignore how growth varies "within" persons because each person contributes only one data point at each time. In contrast, modeling growth with multi-item assessments allows evaluation of how relative item performance may shift over time. While traditionally…
Descriptors: Vocabulary Development, Item Response Theory, Test Items, Student Development
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021
Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…
Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020
Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…
Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling
Bramley, Tom – Research Matters, 2020
The aim of this study was to compare, by simulation, the accuracy of mapping a cut-score from one test to another by expert judgement (using the Angoff method) versus the accuracy with a small-sample equating method (chained linear equating). As expected, the standard-setting method resulted in more accurate equating when we assumed a higher level…
Descriptors: Cutting Scores, Standard Setting (Scoring), Equated Scores, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Yesiltas, Gonca; Paek, Insu – Educational and Psychological Measurement, 2020
A log-linear model (LLM) is a well-known statistical method to examine the relationship among categorical variables. This study investigated the performance of LLM in detecting differential item functioning (DIF) for polytomously scored items via simulations where various sample sizes, ability mean differences (impact), and DIF types were…
Descriptors: Simulation, Sample Size, Item Analysis, Scores
Previous Page | Next Page ยป
Pages: 1  |  2  |  3  |  4  |  5  |  6