ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	12

Descriptor

Test Length	21
Item Response Theory	12
Sample Size	11
Test Items	9
Simulation	6
Test Construction	5
Test Format	5
Computer Assisted Testing	4
Error of Measurement	4
Goodness of Fit	4
Item Analysis	4
Reliability	4
Scores	4
Adaptive Testing	3
Comparative Analysis	3
Equated Scores	3
Estimation (Mathematics)	3
Statistical Analysis	3
Test Reliability	3
Accuracy	2
Computation	2
Difficulty Level	2
Error Patterns	2
Item Banks	2
Monte Carlo Methods	2
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	21
Reports - Research	14
Reports - Evaluative	7
Speeches/Meeting Papers	2
Information Analyses	1

Education Level

Elementary Secondary Education

Audience

Location

Iran

Laws, Policies, & Programs

Assessments and Surveys

Iowa Tests of Basic Skills	2
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 21 results Save | Export

Accuracy and Sensitivity of Coefficient Alpha and Its Alternatives with Unidimensional and Contaminated Scales

Peer reviewed

Direct link

Xiao, Leifeng; Hau, Kit-Tai – Applied Measurement in Education, 2023

We compared coefficient alpha with five alternatives (omega total, omega RT, omega h, GLB, and coefficient H) in two simulation studies. Results showed for unidimensional scales, (a) all indices except omega h performed similarly well for most conditions; (b) alpha is still good; (c) GLB and coefficient H overestimated reliability with small…

Descriptors: Test Theory, Test Reliability, Factor Analysis, Test Length

IRT Characteristic Curve Linking Methods Weighted by Information for Mixed-Format Tests

Peer reviewed

Direct link

Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024

To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…

Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement

Applying a Multiple Comparison Control to IRT Item-Fit Testing

Peer reviewed

Direct link

Sauder, Derek; DeMars, Christine – Applied Measurement in Education, 2020

We used simulation techniques to assess the item-level and familywise Type I error control and power of an IRT item-fit statistic, the "S-X"[superscript 2]. Previous research indicated that the "S-X"[superscript 2] has good Type I error control and decent power, but no previous research examined familywise Type I error control.…

Descriptors: Item Response Theory, Test Items, Sample Size, Test Length

Subscore Equating and Profile Reporting

Peer reviewed

Direct link

Lim, Euijin; Lee, Won-Chan – Applied Measurement in Education, 2020

The purpose of this study is to address the necessity of subscore equating and to evaluate the performance of various equating methods for subtests. Assuming the random groups design and number-correct scoring, this paper analyzed real data and simulated data with four study factors including test dimensionality, subtest length, form difference in…

Descriptors: Equated Scores, Test Length, Test Format, Difficulty Level

Are the Nonparametric Person-Fit Statistics More Powerful than Their Parametric Counterparts? Revisiting the Simulations in Karabatsos (2003)

Peer reviewed

Direct link

Sinharay, Sandip – Applied Measurement in Education, 2017

Karabatsos compared the power of 36 person-fit statistics using receiver operating characteristics curves and found the "H[superscript T]" statistic to be the most powerful in identifying aberrant examinees. He found three statistics, "C", "MCI", and "U3", to be the next most powerful. These four statistics,…

Descriptors: Nonparametric Statistics, Goodness of Fit, Simulation, Comparative Analysis

Item Parameter Drift in a Time-Varying Predictor

Peer reviewed

Direct link

Lee, HyeSun – Applied Measurement in Education, 2018

The current simulation study examined the effects of Item Parameter Drift (IPD) occurring in a short scale on parameter estimates in multilevel models where scores from a scale were employed as a time-varying predictor to account for outcome scores. Five factors, including three decisions about IPD, were considered for simulation conditions. It…

Descriptors: Test Items, Hierarchical Linear Modeling, Predictor Variables, Scores

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

An Empirical Investigation of Methods for Assessing Item Fit for Mixed Format Tests

Peer reviewed

Direct link

Chon, Kyong Hee; Lee, Won-Chan; Ansley, Timothy N. – Applied Measurement in Education, 2013

Empirical information regarding performance of model-fit procedures has been a persistent need in measurement practice. Statistical procedures for evaluating item fit were applied to real test examples that consist of both dichotomously and polytomously scored items. The item fit statistics used in this study included the PARSCALE's G[squared],…

Descriptors: Test Format, Test Items, Item Analysis, Goodness of Fit

Investigation of a Nonparametric Procedure for Assessing Goodness-of-Fit in Item Response Theory

Peer reviewed

Direct link

Wells, Craig S.; Bolt, Daniel M. – Applied Measurement in Education, 2008

Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…

Descriptors: Test Length, Test Items, Monte Carlo Methods, Nonparametric Statistics

Estimating the Internal Consistency Reliability of Tests Composed of Testlets Varying in Length.

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 2002

Considers the degree of bias in testlet-based alpha (internal consistency reliability) through hypothetical examples and real test data from four tests of the Iowa Tests of Basic Skills. Presents a simple formula for computing a testlet-based congeneric coefficient. (SLD)

Descriptors: Estimation (Mathematics), Reliability, Statistical Bias, Test Format

Procedures for Selecting Items for Computerized Adaptive Tests.

Peer reviewed

Kingsbury, G. Gage; Zara, Anthony R. – Applied Measurement in Education, 1989

Several classical approaches and alternative approaches to item selection for computerized adaptive testing (CAT) are reviewed and compared. The study also describes procedures for constrained CAT that may be added to classical item selection approaches to allow them to be used for applied testing. (TJH)

Descriptors: Adaptive Testing, Computer Assisted Testing, Test Construction, Test Length

Assessing the Dimensionality of Item Response Matrices with Small Sample Sizes and Short Test Lengths.

Peer reviewed

De Champlain, Andre; Gessaroli, Marc E. – Applied Measurement in Education, 1998

Type I error rates and rejection rates for three-dimensionality assessment procedures were studied with data sets simulated to reflect short tests and small samples. Results show that the G-squared difference test (D. Bock, R. Gibbons, and E. Muraki, 1988) suffered from a severely inflated Type I error rate at all conditions simulated. (SLD)

Descriptors: Item Response Theory, Matrices, Sample Size, Simulation

Simultaneous Use of Multiple Answer Copying Indexes to Improve Detection Rates

Peer reviewed

Direct link

Wollack, James A. – Applied Measurement in Education, 2006

Many of the currently available statistical indexes to detect answer copying lack sufficient power at small [alpha] levels or when the amount of copying is relatively small. Furthermore, there is no one index that is uniformly best. Depending on the type or amount of copying, certain indexes are better than others. The purpose of this article was…

Descriptors: Statistical Analysis, Item Analysis, Test Length, Sample Size

How Big Is Big Enough? Sample Size Requirements for CAST Item Parameter Estimation

Peer reviewed

Direct link

Chuah, Siang Chee; Drasgow, Fritz; Luecht, Richard – Applied Measurement in Education, 2006

Adaptive tests offer the advantages of reduced test length and increased accuracy in ability estimation. However, adaptive tests require large pools of precalibrated items. This study looks at the development of an item pool for 1 type of adaptive administration: the computer-adaptive sequential test. An important issue is the sample size required…

Descriptors: Test Length, Sample Size, Adaptive Testing, Item Response Theory

Estimating the Reliability of a Test Containing Multiple Item Formats.

Peer reviewed

Qualls, Audrey L. – Applied Measurement in Education, 1995

Classically parallel, tau-equivalently parallel, and congenerically parallel models representing various degrees of part-test parallelism and their appropriateness for tests composed of multiple item formats are discussed. An appropriate reliability estimate for a test with multiple item formats is presented and illustrated. (SLD)

Descriptors: Achievement Tests, Estimation (Mathematics), Measurement Techniques, Test Format

Previous Page | Next Page »

Pages: 1 | 2

Hambleton, Ronald K.	2
Lee, Won-Chan	2
Yen, Wendy M.	2
Ansley, Timothy N.	1
Bergstrom, Betty A.	1
Bolt, Daniel M.	1
Candell, Gregory L.	1
Chon, Kyong Hee	1
Chuah, Siang Chee	1
De Champlain, Andre	1
DeMars, Christine	1
Drasgow, Fritz	1
Feldt, Leonard S.	1
Fitzpatrick, Anne R.	1
Gessaroli, Marc E.	1
Hau, Kit-Tai	1
Jones, Russell W.	1
Kannan, Priya	1
Katz, Irvin R.	1
Kingsbury, G. Gage	1
Lee, HyeSun	1
Lim, Euijin	1
Linn, Robert L.	1
Lixin Yuan	1
More ▼