ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	9

Descriptor

Test Items	44
Sampling	34
Test Construction	17
Item Response Theory	16
Item Sampling	10
Estimation (Mathematics)	8
Difficulty Level	7
Equated Scores	7
Foreign Countries	7
Educational Assessment	6
Language Tests	6
Monte Carlo Methods	6
Scaling	6
Comparative Analysis	5
Data Analysis	5
Equations (Mathematics)	5
Item Bias	5
National Surveys	5
Statistical Analysis	5
Test Bias	5
Test Format	5
Test Validity	5
Testing Problems	5
Elementary Secondary Education	4
Mathematical Models	4
More ▼

Source

Applied Psychological…	7
Educational and Psychological…	3
Journal of Educational…	3
Ministerial Council on…	2
British Educational Research…	1
College Student Journal	1
Educational Measurement:…	1
Educational Testing Service	1
Journal of Educational…	1
Journal of Educational and…	1
Measurement:…	1
Psychological Assessment	1
Psychometrika	1
More ▼

Publication Type

Reports - Evaluative	44
Journal Articles	21
Speeches/Meeting Papers	15
Numerical/Quantitative Data	4
Opinion Papers	2
Collected Works - General	1
Guides - Non-Classroom	1
Information Analyses	1

Education Level

Elementary Education	3
Elementary Secondary Education	2
Grade 6	2
Higher Education	1

Audience

Location

Australia	2
Netherlands	1
Sweden	1
Texas	1
United Kingdom (England)	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	5
Program for International…	2
Child Behavior Checklist	1
College Board Achievement…	1
International Adult Literacy…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 44 results Save | Export

What Is Actually Equated in "Test Equating"? A Didactic Note

Peer reviewed

Direct link

van der Linden, Wim J. – Journal of Educational and Behavioral Statistics, 2022

The current literature on test equating generally defines it as the process necessary to obtain score comparability between different test forms. The definition is in contrast with Lord's foundational paper which viewed equating as the process required to obtain comparability of measurement scale between forms. The distinction between the notions…

Descriptors: Equated Scores, Test Items, Scores, Probability

Evaluating CAT-Adjusted Approaches for Suspected Item Parameter Drift Detection

Peer reviewed

Direct link

Cappaert, Kevin J.; Wen, Yao; Chang, Yu-Feng – Measurement: Interdisciplinary Research and Perspectives, 2018

Events such as curriculum changes or practice effects can lead to item parameter drift (IPD) in computer adaptive testing (CAT). The current investigation introduced a point- and weight-adjusted D[superscript 2] method for IPD detection for use in a CAT environment when items are suspected of drifting across test administrations. Type I error and…

Descriptors: Adaptive Testing, Computer Assisted Testing, Test Items, Identification

Portability of a Screener for Pediatric Bipolar Disorder to a Diverse Setting

Peer reviewed

Direct link

Freeman, Andrew J.; Youngstrom, Eric A.; Frazier, Thomas W.; Youngstrom, Jennifer Kogos; Demeter, Christine; Findling, Robert L. – Psychological Assessment, 2012

Robust screening measures that perform well in different populations could help improve the accuracy of diagnosis of pediatric bipolar disorder. Changes in sampling could influence the performance of items and potentially influence total scores enough to alter the predictive utility of scores. Additionally, creating a brief version of a measure by…

Descriptors: Test Items, Sampling, Caregivers, Test Bias

Estimating a Noncompensatory IRT Model Using Metropolis within Gibbs Sampling

Peer reviewed

Direct link

Babcock, Ben – Applied Psychological Measurement, 2011

Relatively little research has been conducted with the noncompensatory class of multidimensional item response theory (MIRT) models. A Monte Carlo simulation study was conducted exploring the estimation of a two-parameter noncompensatory item response theory (IRT) model. The estimation method used was a Metropolis-Hastings within Gibbs algorithm…

Descriptors: Item Response Theory, Sampling, Computation, Statistical Analysis

Limits on the Accuracy of Linking. Research Report. ETS RR-10-22

Download full text

Haberman, Shelby J. – Educational Testing Service, 2010

Sampling errors limit the accuracy with which forms can be linked. Limitations on accuracy are especially important in testing programs in which a very large number of forms are employed. Standard inequalities in mathematical statistics may be used to establish lower bounds on the achievable inking accuracy. To illustrate results, a variety of…

Descriptors: Testing Programs, Equated Scores, Sampling, Accuracy

Commingled Samples: A Neglected Source of Bias in Reliability Analysis

Peer reviewed

Direct link

Waller, Niels G. – Applied Psychological Measurement, 2008

Reliability is a property of test scores from individuals who have been sampled from a well-defined population. Reliability indices, such as coefficient and related formulas for internal consistency reliability (KR-20, Hoyt's reliability), yield lower bound reliability estimates when (a) subjects have been sampled from a single population and when…

Descriptors: Test Items, Reliability, Scores, Psychometrics

Heightened Sensitivity of Likert Attitude Scales to Restriction of Sample Range.

Download full text

Roberts, James S.; Wedell, Douglas H.; Laughlin, James E. – 1998

The Likert rating scale procedure is often used in conjunction with a graded disagree-agree response scale to measure attitudes. Item characteristic curves associated with graded disagree-agree responses are generally single-peaked, nonmonotonic functions of true attitude. These characteristics are, thus, more generally consistent with an…

Descriptors: Attitudes, Likert Scales, Sampling, Test Items

Applying the Mantel-Haenszel Procedure to Complex Samples of Items.

Peer reviewed

Allen, Nancy L.; Donoghue, John R. – Journal of Educational Measurement, 1996

Examined the effect of complex sampling of items on the measurement of differential item functioning (DIF) using the Mantel-Haenszel procedure through a Monte Carlo study. Suggests the superiority of the pooled booklet method when items are selected for examinees according to a balanced incomplete block design. Discusses implications for other DIF…

Descriptors: Item Bias, Monte Carlo Methods, Research Design, Sampling

Estimates of the Sampling Distribution of Scalability Coefficient H

Peer reviewed

Direct link

Van Onna, Marieke J. H. – Applied Psychological Measurement, 2004

Coefficient "H" is used as an index of scalability in nonparametric item response theory (NIRT). It indicates the degree to which a set of items rank orders examinees. Theoretical sampling distributions, however, have only been derived asymptotically and only under restrictive conditions. Bootstrap methods offer an alternative possibility to…

Descriptors: Sampling, Item Response Theory, Scaling, Comparative Analysis

A Note on the Covariance of the Mantel-Haenszel Log-Odds Ratio Estimator and the Sample Marginal Rates. Program Statistics Research Technical Report No. 89-85.

Download full text

Holland, Paul W. – 1989

A simple technique, developed by A. Phillips (1987) is used to approximate the covariance between the Mantel-Haenszel log-odds-ratio estimator for a 2 x 2 x k table and the sample marginal proportions. These results are then applied to obtain an approximate variance estimate of an adjusted risk difference based on the Mantel-Haenszel odds-ratio…

Descriptors: Difficulty Level, Estimation (Mathematics), Item Bias, Risk

An Investigation of the Sampling Distributions of Equating Coefficients.

Peer reviewed

Baker, Frank B. – Applied Psychological Measurement, 1996

Using the characteristic curve method for dichotomously scored test items, the sampling distributions of equating coefficients were examined. Simulations indicate that for the equating conditions studied, the sampling distributions of the equating coefficients appear to have acceptable characteristics, suggesting confidence in the values obtained…

Descriptors: Equated Scores, Item Response Theory, Sampling, Statistical Distributions

Order-Constrained Bayes Inference for Dichotomous Models of Unidimensional Nonparametric IRT

Peer reviewed

Direct link

Karabatsos, George; Sheu, Ching-Fan – Applied Psychological Measurement, 2004

This study introduces an order-constrained Bayes inference framework useful for analyzing data containing dichotomous scored item responses, under the assumptions of either the monotone homogeneity model or the double monotonicity model of nonparametric item response theory (NIRT). The framework involves the implementation of Gibbs sampling to…

Descriptors: Inferences, Nonparametric Statistics, Item Response Theory, Data Analysis

Item Disordinality with the Bookmark Standard Setting Procedure.

Download full text

Skaggs, Gary; Tessema, Aster – 2001

This paper presents an application of the bookmark procedure to a test comprised of increasing text difficulty levels. The Test of English Proficiency for Adults (TEPA) was used for this study. Three forms of the TEPA were field tested in 1999 with approximately 1,000 non-native English speaking students enrolled in English-as-a-Second-Language…

Descriptors: Adults, Difficulty Level, English (Second Language), Language Tests

Standard Errors of Levine Linear Equating.

Peer reviewed

Hanson, Bradley A.; And Others – Applied Psychological Measurement, 1993

The delta method was used to derive standard errors (SES) of the Levine observed score and Levine true score linear test equating methods using data from two test forms. SES derived without the normality assumption and bootstrap SES were very close. The situation with skewed score distributions is also discussed. (SLD)

Descriptors: Equated Scores, Equations (Mathematics), Error of Measurement, Sampling

On the Efficiency of IRT Models When Applied to Different Sampling Designs. Project Psychometric Aspects of Item Banking No. 45.

Berger, Martijn P. F. – 1989

The problem of obtaining designs that result in the most precise parameter estimates is encountered in at least two situations where item response theory (IRT) models are used. In so-called two-stage testing procedures, certain designs that match difficulty levels of the test items with the ability of the examinees may be located. Such designs…

Descriptors: Difficulty Level, Efficiency, Equations (Mathematics), Heuristics

Previous Page | Next Page »

Pages: 1 | 2 | 3

Donoghue, John R.	3
Allen, Nancy L.	2
Donovan, Jenny	2
Hambleton, Ronald K.	2
Johnson, Eugene G.	2
Lennon, Melissa	2
Meijer, Rob R.	2
Albert, James H.	1
Babcock, Ben	1
Baker, Frank B.	1
Bayless, David L.	1
Beaton, Albert E.	1
Berger, Martijn P. F.	1
Bors, Douglas A.	1
Cappaert, Kevin J.	1
Chang, Yu-Feng	1
Cliff, Norman	1
Cook, Linda L.	1
Demeter, Christine	1
Fan, Xitao	1
Findling, Robert L.	1
Frazier, Thomas W.	1
Freeman, Andrew J.	1
Haberman, Shelby J.	1
More ▼