Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 9 |
Descriptor
Test Items | 44 |
Sampling | 34 |
Test Construction | 17 |
Item Response Theory | 16 |
Item Sampling | 10 |
Estimation (Mathematics) | 8 |
Difficulty Level | 7 |
Equated Scores | 7 |
Foreign Countries | 7 |
Educational Assessment | 6 |
Language Tests | 6 |
More ▼ |
Source
Author
Donoghue, John R. | 3 |
Allen, Nancy L. | 2 |
Donovan, Jenny | 2 |
Hambleton, Ronald K. | 2 |
Johnson, Eugene G. | 2 |
Lennon, Melissa | 2 |
Meijer, Rob R. | 2 |
Albert, James H. | 1 |
Babcock, Ben | 1 |
Baker, Frank B. | 1 |
Bayless, David L. | 1 |
More ▼ |
Publication Type
Reports - Evaluative | 44 |
Journal Articles | 21 |
Speeches/Meeting Papers | 15 |
Numerical/Quantitative Data | 4 |
Opinion Papers | 2 |
Collected Works - General | 1 |
Guides - Non-Classroom | 1 |
Information Analyses | 1 |
Education Level
Elementary Education | 3 |
Elementary Secondary Education | 2 |
Grade 6 | 2 |
Higher Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 5 |
Program for International… | 2 |
Child Behavior Checklist | 1 |
College Board Achievement… | 1 |
International Adult Literacy… | 1 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
van der Linden, Wim J. – Journal of Educational and Behavioral Statistics, 2022
The current literature on test equating generally defines it as the process necessary to obtain score comparability between different test forms. The definition is in contrast with Lord's foundational paper which viewed equating as the process required to obtain comparability of measurement scale between forms. The distinction between the notions…
Descriptors: Equated Scores, Test Items, Scores, Probability
Cappaert, Kevin J.; Wen, Yao; Chang, Yu-Feng – Measurement: Interdisciplinary Research and Perspectives, 2018
Events such as curriculum changes or practice effects can lead to item parameter drift (IPD) in computer adaptive testing (CAT). The current investigation introduced a point- and weight-adjusted D[superscript 2] method for IPD detection for use in a CAT environment when items are suspected of drifting across test administrations. Type I error and…
Descriptors: Adaptive Testing, Computer Assisted Testing, Test Items, Identification
Freeman, Andrew J.; Youngstrom, Eric A.; Frazier, Thomas W.; Youngstrom, Jennifer Kogos; Demeter, Christine; Findling, Robert L. – Psychological Assessment, 2012
Robust screening measures that perform well in different populations could help improve the accuracy of diagnosis of pediatric bipolar disorder. Changes in sampling could influence the performance of items and potentially influence total scores enough to alter the predictive utility of scores. Additionally, creating a brief version of a measure by…
Descriptors: Test Items, Sampling, Caregivers, Test Bias
Babcock, Ben – Applied Psychological Measurement, 2011
Relatively little research has been conducted with the noncompensatory class of multidimensional item response theory (MIRT) models. A Monte Carlo simulation study was conducted exploring the estimation of a two-parameter noncompensatory item response theory (IRT) model. The estimation method used was a Metropolis-Hastings within Gibbs algorithm…
Descriptors: Item Response Theory, Sampling, Computation, Statistical Analysis
Haberman, Shelby J. – Educational Testing Service, 2010
Sampling errors limit the accuracy with which forms can be linked. Limitations on accuracy are especially important in testing programs in which a very large number of forms are employed. Standard inequalities in mathematical statistics may be used to establish lower bounds on the achievable inking accuracy. To illustrate results, a variety of…
Descriptors: Testing Programs, Equated Scores, Sampling, Accuracy
Waller, Niels G. – Applied Psychological Measurement, 2008
Reliability is a property of test scores from individuals who have been sampled from a well-defined population. Reliability indices, such as coefficient and related formulas for internal consistency reliability (KR-20, Hoyt's reliability), yield lower bound reliability estimates when (a) subjects have been sampled from a single population and when…
Descriptors: Test Items, Reliability, Scores, Psychometrics
Roberts, James S.; Wedell, Douglas H.; Laughlin, James E. – 1998
The Likert rating scale procedure is often used in conjunction with a graded disagree-agree response scale to measure attitudes. Item characteristic curves associated with graded disagree-agree responses are generally single-peaked, nonmonotonic functions of true attitude. These characteristics are, thus, more generally consistent with an…
Descriptors: Attitudes, Likert Scales, Sampling, Test Items

Allen, Nancy L.; Donoghue, John R. – Journal of Educational Measurement, 1996
Examined the effect of complex sampling of items on the measurement of differential item functioning (DIF) using the Mantel-Haenszel procedure through a Monte Carlo study. Suggests the superiority of the pooled booklet method when items are selected for examinees according to a balanced incomplete block design. Discusses implications for other DIF…
Descriptors: Item Bias, Monte Carlo Methods, Research Design, Sampling
Van Onna, Marieke J. H. – Applied Psychological Measurement, 2004
Coefficient "H" is used as an index of scalability in nonparametric item response theory (NIRT). It indicates the degree to which a set of items rank orders examinees. Theoretical sampling distributions, however, have only been derived asymptotically and only under restrictive conditions. Bootstrap methods offer an alternative possibility to…
Descriptors: Sampling, Item Response Theory, Scaling, Comparative Analysis
Holland, Paul W. – 1989
A simple technique, developed by A. Phillips (1987) is used to approximate the covariance between the Mantel-Haenszel log-odds-ratio estimator for a 2 x 2 x k table and the sample marginal proportions. These results are then applied to obtain an approximate variance estimate of an adjusted risk difference based on the Mantel-Haenszel odds-ratio…
Descriptors: Difficulty Level, Estimation (Mathematics), Item Bias, Risk

Baker, Frank B. – Applied Psychological Measurement, 1996
Using the characteristic curve method for dichotomously scored test items, the sampling distributions of equating coefficients were examined. Simulations indicate that for the equating conditions studied, the sampling distributions of the equating coefficients appear to have acceptable characteristics, suggesting confidence in the values obtained…
Descriptors: Equated Scores, Item Response Theory, Sampling, Statistical Distributions
Karabatsos, George; Sheu, Ching-Fan – Applied Psychological Measurement, 2004
This study introduces an order-constrained Bayes inference framework useful for analyzing data containing dichotomous scored item responses, under the assumptions of either the monotone homogeneity model or the double monotonicity model of nonparametric item response theory (NIRT). The framework involves the implementation of Gibbs sampling to…
Descriptors: Inferences, Nonparametric Statistics, Item Response Theory, Data Analysis
Skaggs, Gary; Tessema, Aster – 2001
This paper presents an application of the bookmark procedure to a test comprised of increasing text difficulty levels. The Test of English Proficiency for Adults (TEPA) was used for this study. Three forms of the TEPA were field tested in 1999 with approximately 1,000 non-native English speaking students enrolled in English-as-a-Second-Language…
Descriptors: Adults, Difficulty Level, English (Second Language), Language Tests

Hanson, Bradley A.; And Others – Applied Psychological Measurement, 1993
The delta method was used to derive standard errors (SES) of the Levine observed score and Levine true score linear test equating methods using data from two test forms. SES derived without the normality assumption and bootstrap SES were very close. The situation with skewed score distributions is also discussed. (SLD)
Descriptors: Equated Scores, Equations (Mathematics), Error of Measurement, Sampling
Berger, Martijn P. F. – 1989
The problem of obtaining designs that result in the most precise parameter estimates is encountered in at least two situations where item response theory (IRT) models are used. In so-called two-stage testing procedures, certain designs that match difficulty levels of the test items with the ability of the examinees may be located. Such designs…
Descriptors: Difficulty Level, Efficiency, Equations (Mathematics), Heuristics