ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	2
Since 2007 (last 20 years)	7

Descriptor

Sampling	9
Test Items	9
Test Length	9
Item Response Theory	5
Difficulty Level	4
Sample Size	4
Computation	3
Error of Measurement	3
Simulation	3
Test Construction	3
Accuracy	2
Comparative Analysis	2
Item Analysis	2
Markov Processes	2
Mathematics	2
Monte Carlo Methods	2
Reliability	2
Scores	2
Test Format	2
Achievement Tests	1
Adolescents	1
Bayesian Statistics	1
Classification	1
Correlation	1
Cutting Scores	1
More ▼

Source

ProQuest LLC	3
Applied Measurement in…	1
ETS Research Report Series	1
Educational and Psychological…	1
Journal of Educational and…	1
Journal of Experimental…	1

Publication Type

Reports - Research	6
Journal Articles	5
Dissertations/Theses -…	3

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

National Longitudinal Study…

What Works Clearinghouse Rating

Showing all 9 results Save | Export

Robustness of Weighted Differential Item Functioning (DIF) Analysis: The Case of Mantel-Haenszel DIF Statistics. Research Report. ETS RR-21-12

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021

Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…

Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis

Detection and Treatment of Careless Responses to Improve Item Parameter Estimation

Peer reviewed

Direct link

Patton, Jeffrey M.; Cheng, Ying; Hong, Maxwell; Diao, Qi – Journal of Educational and Behavioral Statistics, 2019

In psychological and survey research, the prevalence and serious consequences of careless responses from unmotivated participants are well known. In this study, we propose to iteratively detect careless responders and cleanse the data by removing their responses. The careless responders are detected using person-fit statistics. In two simulation…

Descriptors: Test Items, Response Style (Tests), Identification, Computation

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

Accuracy and Variability of Item Parameter Estimates from Marginal Maximum a Posteriori Estimation and Bayesian Inference via Gibbs Samplers

Direct link

Wu, Yi-Fang – ProQuest LLC, 2015

Item response theory (IRT) uses a family of statistical models for estimating stable characteristics of items and examinees and defining how these characteristics interact in describing item and test performance. With a focus on the three-parameter logistic IRT (Birnbaum, 1968; Lord, 1980) model, the current study examines the accuracy and…

Descriptors: Item Response Theory, Test Items, Accuracy, Computation

Minimum Sample Size Requirements for Mokken Scale Analysis

Peer reviewed

Direct link

Straat, J. Hendrik; van der Ark, L. Andries; Sijtsma, Klaas – Educational and Psychological Measurement, 2014

An automated item selection procedure in Mokken scale analysis partitions a set of items into one or more Mokken scales, if the data allow. Two algorithms are available that pursue the same goal of selecting Mokken scales of maximum length: Mokken's original automated item selection procedure (AISP) and a genetic algorithm (GA). Minimum…

Descriptors: Sampling, Test Items, Effect Size, Scaling

Bi-Factor Multidimensional Item Response Theory Modeling for Subscores Estimation, Reliability, and Classification

Direct link

Md Desa, Zairul Nor Deana – ProQuest LLC, 2012

In recent years, there has been increasing interest in estimating and improving subscore reliability. In this study, the multidimensional item response theory (MIRT) and the bi-factor model were combined to estimate subscores, to obtain subscores reliability, and subscores classification. Both the compensatory and partially compensatory MIRT…

Descriptors: Item Response Theory, Computation, Reliability, Classification

Conditions Affecting the Accuracy of Classical Equating Methods for Small Samples under the NEAT Design: A Simulation Study

Direct link

Sunnassee, Devdass – ProQuest LLC, 2011

Small sample equating remains a largely unexplored area of research. This study attempts to fill in some of the research gaps via a large-scale, IRT-based simulation study that evaluates the performance of seven small-sample equating methods under various test characteristic and sampling conditions. The equating methods considered are typically…

Descriptors: Test Length, Test Format, Sample Size, Simulation

Estimation of Test Length for Domain-Referenced Reading Comprehension Tests.

Peer reviewed

Berk, Ronald A. – Journal of Experimental Education, 1980

A sampling methodology is proposed for determining lengths of tests designed to assess the comprehension of written discourse. It is based on Bormuth's transformational analysis, within a domain-referenced framework. Guidelines are provided for computing sample size and selecting sentences to which the transformational rules can be applied.…

Descriptors: Reading Comprehension, Reading Tests, Sampling, Test Construction

A Comparison of Simple Random Sampling Versus Stratification for Allocating Items to Subtests in Multiple Matrix Sampling.

Download full text

Scheetz, James P.; Forsyth, Robert A. – 1977

Empirical evidence is presented related to the effects of using a stratified sampling of items in multiple matrix sampling on the accuracy of estimates of the population mean. Data were obtained from a sample of 600 high school students for a 36-item mathematics test and a 40-item vocabulary test, both subtests of the Iowa Tests of Educational…

Descriptors: Achievement Tests, Difficulty Level, Item Analysis, Item Sampling

Berk, Ronald A.	1
Cheng, Ying	1
Diao, Qi	1
Dorans, Neil J.	1
Forsyth, Robert A.	1
Guo, Hongwen	1
Hong, Maxwell	1
Kannan, Priya	1
Katz, Irvin R.	1
Lu, Ru	1
Md Desa, Zairul Nor Deana	1
Patton, Jeffrey M.	1
Scheetz, James P.	1
Sgammato, Adrienne	1
Sijtsma, Klaas	1
Straat, J. Hendrik	1
Sunnassee, Devdass	1
Tannenbaum, Richard J.	1
Wu, Yi-Fang	1
van der Ark, L. Andries	1
More ▼