Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 5 |
Descriptor
Models | 11 |
Test Items | 11 |
Sampling | 9 |
Item Response Theory | 6 |
Difficulty Level | 4 |
Test Construction | 4 |
Item Sampling | 3 |
Bayesian Statistics | 2 |
Computation | 2 |
Monte Carlo Methods | 2 |
Nonparametric Statistics | 2 |
More ▼ |
Source
Applied Psychological… | 2 |
Assessment & Evaluation in… | 1 |
Educational and Psychological… | 1 |
Measurement:… | 1 |
Psychological Review | 1 |
Psychometrika | 1 |
Author
Allen, Nancy L. | 1 |
Babcock, Ben | 1 |
Burton, Richard F. | 1 |
Donoghue, John R. | 1 |
Fan, Xitao | 1 |
Gifford, Janice A. | 1 |
Hambleton, Ronald K. | 1 |
Harris, Chester W. | 1 |
Liang, Xinya | 1 |
Lin, Zhongtian | 1 |
Paek, Insu | 1 |
More ▼ |
Publication Type
Journal Articles | 7 |
Reports - Research | 6 |
Reports - Evaluative | 3 |
Speeches/Meeting Papers | 3 |
Reports - Descriptive | 2 |
Education Level
Audience
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 1 |
Texas Assessment of Academic… | 1 |
What Works Clearinghouse Rating
Paek, Insu; Liang, Xinya; Lin, Zhongtian – Measurement: Interdisciplinary Research and Perspectives, 2021
The property of item parameter invariance in item response theory (IRT) plays a pivotal role in the applications of IRT such as test equating. The scope of parameter invariance when using estimates from finite biased samples in the applications of IRT does not appear to be clearly documented in the IRT literature. This article provides information…
Descriptors: Item Response Theory, Computation, Test Items, Bias
Straat, J. Hendrik; van der Ark, L. Andries; Sijtsma, Klaas – Educational and Psychological Measurement, 2014
An automated item selection procedure in Mokken scale analysis partitions a set of items into one or more Mokken scales, if the data allow. Two algorithms are available that pursue the same goal of selecting Mokken scales of maximum length: Mokken's original automated item selection procedure (AISP) and a genetic algorithm (GA). Minimum…
Descriptors: Sampling, Test Items, Effect Size, Scaling
Babcock, Ben – Applied Psychological Measurement, 2011
Relatively little research has been conducted with the noncompensatory class of multidimensional item response theory (MIRT) models. A Monte Carlo simulation study was conducted exploring the estimation of a two-parameter noncompensatory item response theory (IRT) model. The estimation method used was a Metropolis-Hastings within Gibbs algorithm…
Descriptors: Item Response Theory, Sampling, Computation, Statistical Analysis
Ratcliff, Roger; Starns, Jeffrey J. – Psychological Review, 2009
A new model for confidence judgments in recognition memory is presented. In the model, the match between a single test item and memory produces a distribution of evidence, with better matches corresponding to distributions with higher means. On this match dimension, confidence criteria are placed, and the areas between the criteria under the…
Descriptors: Recognition (Psychology), Models, Test Items, Reaction Time
Gifford, Janice A.; Hambleton, Ronald K. – 1980
Technical considerations associated with item selection and reliability assessment are considered in relation to criterion-referenced tests constructed to provide group information. The purpose is to emphasize test building and the evaluation of test scores in program evaluation studies. It is stressed that an evaluator employ a performance or…
Descriptors: Criterion Referenced Tests, Group Testing, Item Sampling, Models
Van Onna, Marieke J. H. – Applied Psychological Measurement, 2004
Coefficient "H" is used as an index of scalability in nonparametric item response theory (NIRT). It indicates the degree to which a set of items rank orders examinees. Theoretical sampling distributions, however, have only been derived asymptotically and only under restrictive conditions. Bootstrap methods offer an alternative possibility to…
Descriptors: Sampling, Item Response Theory, Scaling, Comparative Analysis
Burton, Richard F. – Assessment & Evaluation in Higher Education, 2006
Many academic tests (e.g. short-answer and multiple-choice) sample required knowledge with questions scoring 0 or 1 (dichotomous scoring). Few textbooks give useful guidance on the length of test needed to do this reliably. Posey's binomial error model of 1932 provides the best starting point, but allows neither for heterogeneity of question…
Descriptors: Item Sampling, Tests, Test Length, Test Reliability
Revuelta, Javier – Psychometrika, 2004
Two psychometric models are presented for evaluating the difficulty of the distractors in multiple-choice items. They are based on the criterion of rising distractor selection ratios, which facilitates interpretation of the subject and item parameters. Statistical inferential tools are developed in a Bayesian framework: modal a posteriori…
Descriptors: Multiple Choice Tests, Psychometrics, Models, Difficulty Level
Harris, Chester W.; And Others – 1980
The third section of this two-volume report examines the utility of the model developed in section I for application in a classroom testing situation. The model was applied in arithmetic classes through a weekly testing program. The teacher specified the generic task being taught, tests were constructed using a table of random numbers to select…
Descriptors: Achievement Tests, Classroom Research, Difficulty Level, Elementary Education
Allen, Nancy L.; Donoghue, John R. – 1995
This Monte Carlo study examined the effect of complex sampling of items on the measurement of differential item functioning (DIF) using the Mantel-Haenszel procedure. Data were generated using a three-parameter logistic item response theory model according to the balanced incomplete block (BIB) design used in the National Assessment of Educational…
Descriptors: Computer Assisted Testing, Difficulty Level, Elementary Secondary Education, Identification
Fan, Xitao; And Others – 1994
The hypothesis that faulty classical psychometric and sampling procedures in test construction could generate systematic bias against ethnic groups with smaller representation in the test construction sample was studied empirically. Two test construction models were developed: one with differential representation of ethnic groups (White, African…
Descriptors: Ethnic Groups, Genetics, High School Students, High Schools