ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	15

Descriptor

Probability	20
Test Items	20
Item Response Theory	6
Simulation	6
Computation	5
Item Analysis	5
Models	5
Test Bias	5
Classification	4
Goodness of Fit	4
Error Patterns	3
Measurement Techniques	3
Standard Setting (Scoring)	3
Statistical Analysis	3
Accuracy	2
Achievement Tests	2
Comparative Analysis	2
Computer Assisted Testing	2
Cutting Scores	2
Difficulty Level	2
Equations (Mathematics)	2
Evaluation Methods	2
Factor Analysis	2
Graphs	2
Guessing (Tests)	2
More ▼

Source

Educational and Psychological…

Publication Type

Journal Articles	19
Reports - Research	14
Reports - Evaluative	3
Reports - Descriptive	1

Education Level

Elementary Education	1
Elementary Secondary Education	1
Grade 11	1
Grade 5	1
Grade 8	1

Audience

Location

Spain

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

What Works Clearinghouse Rating

Showing 1 to 15 of 20 results Save | Export

Testing for Differential Item Functioning under the "D"-Scoring Method

Peer reviewed

Direct link

Dimitrov, Dimiter M.; Atanasov, Dimitar V. – Educational and Psychological Measurement, 2022

This study offers an approach to testing for differential item functioning (DIF) in a recently developed measurement framework, referred to as "D"-scoring method (DSM). Under the proposed approach, called "P-Z" method of testing for DIF, the item response functions of two groups (reference and focal) are compared by…

Descriptors: Test Bias, Methods, Test Items, Scoring

Correcting for Extreme Response Style: Model Choice Matters

Peer reviewed

Direct link

Martijn Schoenmakers; Jesper Tijmstra; Jeroen Vermunt; Maria Bolsinova – Educational and Psychological Measurement, 2024

Extreme response style (ERS), the tendency of participants to select extreme item categories regardless of the item content, has frequently been found to decrease the validity of Likert-type questionnaire results. For this reason, various item response theory (IRT) models have been proposed to model ERS and correct for it. Comparisons of these…

Descriptors: Item Response Theory, Response Style (Tests), Models, Likert Scales

A Propensity Score Method for Investigating Differential Item Functioning in Performance Assessment

Peer reviewed

Direct link

Chen, Michelle Y.; Liu, Yan; Zumbo, Bruno D. – Educational and Psychological Measurement, 2020

This study introduces a novel differential item functioning (DIF) method based on propensity score matching that tackles two challenges in analyzing performance assessment data, that is, continuous task scores and lack of a reliable internal variable as a proxy for ability or aptitude. The proposed DIF method consists of two main stages. First,…

Descriptors: Probability, Scores, Evaluation Methods, Test Items

A Graphical Method for Displaying the Model Fit of Item Response Theory Trace Lines

Peer reviewed

Direct link

Kalinowski, Steven T. – Educational and Psychological Measurement, 2019

Item response theory (IRT) is a statistical paradigm for developing educational tests and assessing students. IRT, however, currently lacks an established graphical method for examining model fit for the three-parameter logistic model, the most flexible and popular IRT model in educational testing. A method is presented here to do this. The graph,…

Descriptors: Item Response Theory, Educational Assessment, Goodness of Fit, Probability

Partially Compensatory Multidimensional Item Response Theory Models: Two Alternate Model Forms

Peer reviewed

Direct link

DeMars, Christine E. – Educational and Psychological Measurement, 2016

Partially compensatory models may capture the cognitive skills needed to answer test items more realistically than compensatory models, but estimating the model parameters may be a challenge. Data were simulated to follow two different partially compensatory models, a model with an interaction term and a product model. The model parameters were…

Descriptors: Item Response Theory, Models, Thinking Skills, Test Items

Reweighting Data in the Spirit of Tukey: Using Bayesian Posterior Probabilities as Rasch Residuals for Studying Misfit

Peer reviewed

Direct link

Dardick, William R.; Mislevy, Robert J. – Educational and Psychological Measurement, 2016

A new variant of the iterative "data = fit + residual" data-analytical approach described by Mosteller and Tukey is proposed and implemented in the context of item response theory psychometric models. Posterior probabilities from a Bayesian mixture model of a Rasch item response theory model and an unscalable latent class are expressed…

Descriptors: Bayesian Statistics, Probability, Data Analysis, Item Response Theory

Studying Differential Item Functioning via Latent Variable Modeling: A Note on a Multiple-Testing Procedure

Peer reviewed

Direct link

Raykov, Tenko; Marcoulides, George A.; Lee, Chun-Lung; Chang, Chi – Educational and Psychological Measurement, 2013

This note is concerned with a latent variable modeling approach for the study of differential item functioning in a multigroup setting. A multiple-testing procedure that can be used to evaluate group differences in response probabilities on individual items is discussed. The method is readily employed when the aim is also to locate possible…

Descriptors: Test Bias, Statistical Analysis, Models, Hypothesis Testing

Testing Measurement Invariance Using MIMIC: Likelihood Ratio Test with a Critical Value Adjustment

Peer reviewed

Direct link

Kim, Eun Sook; Yoon, Myeongsun; Lee, Taehun – Educational and Psychological Measurement, 2012

Multiple-indicators multiple-causes (MIMIC) modeling is often used to test a latent group mean difference while assuming the equivalence of factor loadings and intercepts over groups. However, this study demonstrated that MIMIC was insensitive to the presence of factor loading noninvariance, which implies that factor loading invariance should be…

Descriptors: Test Items, Simulation, Testing, Statistical Analysis

Using a Model of Analysts' Judgments to Augment an Item Calibration Process

Peer reviewed

Direct link

Hauser, Carl; Thum, Yeow Meng; He, Wei; Ma, Lingling – Educational and Psychological Measurement, 2015

When conducting item reviews, analysts evaluate an array of statistical and graphical information to assess the fit of a field test (FT) item to an item response theory model. The process can be tedious, particularly when the number of human reviews (HR) to be completed is large. Furthermore, such a process leads to decisions that are susceptible…

Descriptors: Test Items, Item Response Theory, Research Methodology, Decision Making

Formulating the Rasch Differential Item Functioning Model under the Marginal Maximum Likelihood Estimation Context and Its Comparison with Mantel-Haenszel Procedure in Short Test and Small Sample Conditions

Peer reviewed

Direct link

Paek, Insu; Wilson, Mark – Educational and Psychological Measurement, 2011

This study elaborates the Rasch differential item functioning (DIF) model formulation under the marginal maximum likelihood estimation context. Also, the Rasch DIF model performance was examined and compared with the Mantel-Haenszel (MH) procedure in small sample and short test length conditions through simulations. The theoretically known…

Descriptors: Test Bias, Test Length, Statistical Inference, Geometric Concepts

Computerized Classification Testing under the One-Parameter Logistic Response Model with Ability-Based Guessing

Peer reviewed

Direct link

Wang, Wen-Chung; Huang, Sheng-Yun – Educational and Psychological Measurement, 2011

The one-parameter logistic model with ability-based guessing (1PL-AG) has been recently developed to account for effect of ability on guessing behavior in multiple-choice items. In this study, the authors developed algorithms for computerized classification testing under the 1PL-AG and conducted a series of simulations to evaluate their…

Descriptors: Computer Assisted Testing, Classification, Item Analysis, Probability

Peer reviewed

Direct link

Wyse, Adam E. – Educational and Psychological Measurement, 2011

Standard setting is a method used to set cut scores on large-scale assessments. One of the most popular standard setting methods is the Bookmark method. In the Bookmark method, panelists are asked to envision a response probability (RP) criterion and move through a booklet of ordered items based on a RP criterion. This study investigates whether…

Descriptors: Testing Programs, Standard Setting (Scoring), Cutting Scores, Probability

The Use of Subsets of Test Questions in an Angoff Standard-Setting Method

Peer reviewed

Direct link

Ferdous, Abdullah A.; Plake, Barbara S. – Educational and Psychological Measurement, 2005

In an Angoff standard-setting procedure, judges estimate the probability that a hypothetical randomly selected minimally competent candidate will answer correctly each item constituting the test. In many cases, these item performance estimates are made twice, with information shared with the judges between estimates. Especially for long tests,…

Descriptors: Test Items, Probability, Standard Setting (Scoring)

Item Selection Strategy for Reducing the Number of Items Rated in an Angoff Standard Setting Study

Peer reviewed

Direct link

Ferdous, Abdullah A.; Plake, Barbara S. – Educational and Psychological Measurement, 2007

In an Angoff standard setting procedure, judges estimate the probability that a hypothetical randomly selected minimally competent candidate will answer correctly each item in the test. In many cases, these item performance estimates are made twice, with information shared with the panelists between estimates. Especially for long tests, this…

Descriptors: Test Items, Probability, Item Analysis, Standard Setting (Scoring)

Analyzing the Distractors of Multiple-Choice Test Items or Partitioning Multinomial Cell Probabilities with Respect to a Standard.

Peer reviewed

Wilcox, Rand R. – Educational and Psychological Measurement, 1981

A formal framework is presented for determining which of the distractors of multiple-choice test items has a small probability of being chosen by a typical examinee. The framework is based on a procedure similar to an indifference zone formulation of a ranking and election problem. (Author/BW)

Descriptors: Mathematical Models, Multiple Choice Tests, Probability, Test Items

Previous Page | Next Page »

Pages: 1 | 2

Ferdous, Abdullah A.	2
Plake, Barbara S.	2
Wilcox, Rand R.	2
Aiken, Lewis R.	1
Atanasov, Dimitar V.	1
Chang, Chi	1
Chen, Michelle Y.	1
Dardick, William R.	1
DeMars, Christine E.	1
Dimitrov, Dimiter M.	1
Hauser, Carl	1
He, Wei	1
Hernandez, Jose M.	1
Huang, Sheng-Yun	1
Jeroen Vermunt	1
Jesper Tijmstra	1
Kalinowski, Steven T.	1
Kim, Eun Sook	1
Lee, Chun-Lung	1
Lee, Taehun	1
Liu, Yan	1
Ma, Lingling	1
Marcoulides, George A.	1
Maria Bolsinova	1
Martijn Schoenmakers	1
More ▼