NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Date
In 20260
Since 20250
Since 2022 (last 5 years)0
Since 2017 (last 10 years)0
Since 2007 (last 20 years)6
Education Level
Audience
Location
Turkey1
Laws, Policies, & Programs
Assessments and Surveys
Advanced Placement…1
What Works Clearinghouse Rating
Showing 1 to 15 of 29 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Atalay Kabasakal, Kübra; Arsan, Nihan; Gök, Bilge; Kelecioglu, Hülya – Educational Sciences: Theory and Practice, 2014
This simulation study compared the performances (Type I error and power) of Mantel-Haenszel (MH), SIBTEST, and item response theory-likelihood ratio (IRT-LR) methods under certain conditions. Manipulated factors were sample size, ability differences between groups, test length, the percentage of differential item functioning (DIF), and underlying…
Descriptors: Comparative Analysis, Item Response Theory, Statistical Analysis, Test Bias
Peer reviewed Peer reviewed
Direct linkDirect link
Kang, Taehoon; Petersen, Nancy S. – Asia Pacific Education Review, 2012
This paper compares three methods of item calibration--concurrent calibration, separate calibration with linking, and fixed item parameter calibration--that are frequently used for linking item parameters to a base scale. Concurrent and separate calibrations were implemented using BILOG-MG. The Stocking and Lord in "Appl Psychol Measure"…
Descriptors: Methods, Comparative Analysis, Test Items, Item Response Theory
Wang, Wei – ProQuest LLC, 2013
Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…
Descriptors: Equated Scores, Test Format, Test Items, Test Length
Carvajal-Espinoza, Jorge E. – ProQuest LLC, 2011
The Non-Equivalent groups with Anchor Test equating (NEAT) design is a widely used equating design in large scale testing that involves two groups that do not have to be of equal ability. One group P gets form X and a group of items A and the other group Q gets form Y and the same group of items A. One of the most commonly used equating methods in…
Descriptors: Sample Size, Equated Scores, Psychometrics, Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Wen-Chung; Huang, Sheng-Yun – Educational and Psychological Measurement, 2011
The one-parameter logistic model with ability-based guessing (1PL-AG) has been recently developed to account for effect of ability on guessing behavior in multiple-choice items. In this study, the authors developed algorithms for computerized classification testing under the 1PL-AG and conducted a series of simulations to evaluate their…
Descriptors: Computer Assisted Testing, Classification, Item Analysis, Probability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Zhang, Jinming; Lu, Ting – ETS Research Report Series, 2007
In practical applications of item response theory (IRT), item parameters are usually estimated first from a calibration sample. After treating these estimates as fixed and known, ability parameters are then estimated. However, the statistical inferences based on the estimated abilities can be misleading if the uncertainty of the item parameter…
Descriptors: Item Response Theory, Ability, Error of Measurement, Maximum Likelihood Statistics
Clauser, Brian; And Others – 1992
Previous research examining the effects of reducing the number of score groups used in the matching criterion of the Mantel-Haenszel procedure, when screening for differential item functioning, has produced ambiguous results. The goal of this study was to resolve the ambiguity by examining the problem with a simulated data set. The main results…
Descriptors: Ability, Comparative Analysis, Computer Simulation, Item Bias
Peer reviewed Peer reviewed
MacDonald, Paul; Paunonen, Sampo V. – Educational and Psychological Measurement, 2002
Examined the behavior of item and person statistics from item response theory and classical test theory frameworks through Monte Carlo methods with simulated test data. Findings suggest that item difficulty and person ability estimates are highly comparable for both approaches. (SLD)
Descriptors: Ability, Comparative Analysis, Difficulty Level, Item Response Theory
Peer reviewed Peer reviewed
Whitmore, Marjorie L.; Schumacker, Randall E. – Educational and Psychological Measurement, 1999
Compared differential item functioning detection rates for logistic regression and analysis of variance for dichotomously scored items using simulated data and varying test length, sample size, discrimination rate, and underlying ability. Explains why the logistic regression method is recommended for most applications. (SLD)
Descriptors: Ability, Analysis of Variance, Comparative Analysis, Item Bias
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Zhang, Jinming – ETS Research Report Series, 2005
Lord's bias function and the weighted likelihood estimation method are effective in reducing the bias of the maximum likelihood estimate of an examinee's ability under the assumption that the true item parameters are known. This paper presents simulation studies to determine the effectiveness of these two methods in reducing the bias when the item…
Descriptors: Statistical Bias, Maximum Likelihood Statistics, Computation, Ability
PDF pending restoration PDF pending restoration
Kirisci, Levent; Hsu, Tse-Chi – 1995
The main goal of this study was to assess how sensitive unidimensional parameter estimates derived from BILOG were when the unidimensionality assumption was violated and the underlying ability distribution was not multivariate normal. A multidimensional three-parameter logistic distribution that was a straightforward generalization of the…
Descriptors: Ability, Comparative Analysis, Correlation, Difficulty Level
Frey, Sharon L. – 1996
The Mantel-Haenszel procedure (N. Mantel and W. Haenszel, 1959) and its extension to constructed response items, the Generalized Mantel Haenszel (A. Agresti, 1990), compare performance of subgroups across different score groups to determine differential item functioning (DIF). At each level of comparison, or score group, the subgroups are…
Descriptors: Ability, Comparative Analysis, Constructed Response, Ethnic Groups
Pommerich, Mary; And Others – 1994
The functioning of two population-based Mantel-Haenszel (MH) common-odds ratios was compared. One ratio is conditioned on the observed test score, while the other is conditioned on a latent trait or true ability score. When the comparison group distributions are incongruent or nonoverlapping to some degree, the observed score represents different…
Descriptors: Ability, Comparative Analysis, Item Bias, Performance
Vale, C. David; Weiss, David J. – 1975
A conventional test and two forms of a stradaptive test were administered to thousands of simulated subjects by minicomputer. Characteristics of the three tests using several scoring techniques were investigated while varying the discriminating power of the items, the lengths of the tests, and the availability of prior information about the…
Descriptors: Ability, Branching, Comparative Analysis, Computer Oriented Programs
Thomasson, Gary L. – 1997
Score comparability is important to those who take tests and those who use them. One important concept related to test score comparability is that of "equity," which is defined as existing when examinees are indifferent as to which of two alternate forms of a test they would prefer to take. By their nature, computerized adaptive tests…
Descriptors: Ability, Adaptive Testing, Comparative Analysis, Computer Assisted Testing
Previous Page | Next Page »
Pages: 1  |  2