Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 6 |
Descriptor
Evaluation Methods | 7 |
Probability | 7 |
Simulation | 4 |
Data Analysis | 3 |
Item Response Theory | 3 |
Error Patterns | 2 |
Error of Measurement | 2 |
Goodness of Fit | 2 |
Measurement Techniques | 2 |
Test Bias | 2 |
Test Items | 2 |
More ▼ |
Source
Educational and Psychological… | 7 |
Author
Zumbo, Bruno D. | 2 |
Beretvas, S. Natasha | 1 |
Berry, Kenneth J. | 1 |
Carstensen, Claus H. | 1 |
Chen, Michelle Y. | 1 |
Drasgow, Fritz | 1 |
Kim, Eun Sook | 1 |
Köhler, Carmen | 1 |
Lee, HwaYoung | 1 |
Lee, Taehun | 1 |
Liu, Yan | 1 |
More ▼ |
Publication Type
Journal Articles | 7 |
Reports - Research | 4 |
Book/Product Reviews | 1 |
Reports - Descriptive | 1 |
Reports - Evaluative | 1 |
Education Level
Grade 9 | 1 |
High Schools | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Secondary Education | 1 |
Audience
Location
Germany | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Chen, Michelle Y.; Liu, Yan; Zumbo, Bruno D. – Educational and Psychological Measurement, 2020
This study introduces a novel differential item functioning (DIF) method based on propensity score matching that tackles two challenges in analyzing performance assessment data, that is, continuous task scores and lack of a reliable internal variable as a proxy for ability or aptitude. The proposed DIF method consists of two main stages. First,…
Descriptors: Probability, Scores, Evaluation Methods, Test Items
Köhler, Carmen; Pohl, Steffi; Carstensen, Claus H. – Educational and Psychological Measurement, 2015
When competence tests are administered, subjects frequently omit items. These missing responses pose a threat to correctly estimating the proficiency level. Newer model-based approaches aim to take nonignorable missing data processes into account by incorporating a latent missing propensity into the measurement model. Two assumptions are typically…
Descriptors: Competence, Tests, Evaluation Methods, Adults
Lee, HwaYoung; Beretvas, S. Natasha – Educational and Psychological Measurement, 2014
Conventional differential item functioning (DIF) detection methods (e.g., the Mantel-Haenszel test) can be used to detect DIF only across observed groups, such as gender or ethnicity. However, research has found that DIF is not typically fully explained by an observed variable. True sources of DIF may include unobserved, latent variables, such as…
Descriptors: Item Analysis, Factor Structure, Bayesian Statistics, Goodness of Fit
Kim, Eun Sook; Yoon, Myeongsun; Lee, Taehun – Educational and Psychological Measurement, 2012
Multiple-indicators multiple-causes (MIMIC) modeling is often used to test a latent group mean difference while assuming the equivalence of factor loadings and intercepts over groups. However, this study demonstrated that MIMIC was insensitive to the presence of factor loading noninvariance, which implies that factor loading invariance should be…
Descriptors: Test Items, Simulation, Testing, Statistical Analysis
Tay, Louis; Drasgow, Fritz – Educational and Psychological Measurement, 2012
Two Monte Carlo simulation studies investigated the effectiveness of the mean adjusted X[superscript 2]/df statistic proposed by Drasgow and colleagues and, because of problems with the method, a new approach for assessing the goodness of fit of an item response theory model was developed. It has been previously recommended that mean adjusted…
Descriptors: Test Length, Monte Carlo Methods, Goodness of Fit, Item Response Theory
Rupp, Andre A.; Zumbo, Bruno D. – Educational and Psychological Measurement, 2006
One theoretical feature that makes item response theory (IRT) models those of choice for many psychometric data analysts is parameter invariance, the equality of item and examinee parameters from different examinee populations or measurement conditions. In this article, using the well-known fact that item and examinee parameters are identical only…
Descriptors: Psychometrics, Probability, Simulation, Item Response Theory

Berry, Kenneth J.; Mielke, Paul W., Jr. – Educational and Psychological Measurement, 1997
Describes a FORTRAN software program that calculates the probability of an observed difference between agreement measures obtained from two independent sets of raters. An example illustrates the use of the DIFFER program in evaluating undergraduate essays. (Author/SLD)
Descriptors: Comparative Analysis, Computer Software, Evaluation Methods, Higher Education