NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 11 results Save | Export
Peer reviewed Peer reviewed
Noonan, Brian W.; And Others – Applied Psychological Measurement, 1992
Studied the extent to which three appropriateness indexes, Z(sub 3), ECIZ4, and W, are well standardized in a Monte Carlo study. The ECIZ4 most closely approximated a normal distribution, and its skewness and kurtosis were more stable and less affected by test length and item response theory model than the others. (SLD)
Descriptors: Comparative Analysis, Item Response Theory, Mathematical Models, Maximum Likelihood Statistics
Cohen, Allan S.; Kim, Seock-Ho – 1993
Equating tests from different calibrations under item response theory (IRT) requires calculation of the slope and intercept of the appropriate linear transformation. Two methods have been proposed recently for equating graded response items under IRT, a test characteristic curve method and a minimum chi-square method. These two methods are…
Descriptors: Chi Square, Comparative Analysis, Computer Simulation, Equated Scores
Kim, Seock-Ho; And Others – 1992
Hierarchical Bayes procedures were compared for estimating item and ability parameters in item response theory. Simulated data sets from the two-parameter logistic model were analyzed using three different hierarchical Bayes procedures: (1) the joint Bayesian with known hyperparameters (JB1); (2) the joint Bayesian with information hyperpriors…
Descriptors: Ability, Bayesian Statistics, Comparative Analysis, Equations (Mathematics)
Peer reviewed Peer reviewed
Donoghue, John R.; Allen, Nancy L. – Journal of Educational Statistics, 1993
Forming the matching variable for the Mantel-Haenszel differential item functioning (DIF) procedure through use of the total score as the matching variable (thin) and forming the matching variable by pooling total score levels (thick) were compared in a Monte Carlo study. Reasons thick matching is superior are discussed. (SLD)
Descriptors: Comparative Analysis, Computer Simulation, Equations (Mathematics), Graphs
Eignor, Daniel R.; Hambleton, Ronald K. – 1979
The purpose of the investigation was to obtain some relationships among (1) test lengths, (2) shape of domain-score distributions, (3) advancement scores, and (4) several criterion-referenced test score reliability and validity indices. The study was conducted using computer simulation methods. The values of variables under study were set to be…
Descriptors: Comparative Analysis, Computer Assisted Testing, Criterion Referenced Tests, Cutting Scores
Schaefer, Mary M.; Gross, Susan K. – 1983
Viewing the reliability for criterion-referenced tests as that of mastery classification decisions, three models for determining reliability were examined using two test administrations so that two estimates could be compared to a standard. A major purpose of the research was to determine how several reliability coefficients (coefficient kappa, an…
Descriptors: Comparative Analysis, Correlation, Criterion Referenced Tests, Cutting Scores
McKinley, Robert L.; Reckase, Mark D. – 1981
A study was conducted to compare tailored testing procedures based on a Bayesian ability estimation technique and on a maximum likelihood ability estimation technique. The Bayesian tailored testing procedure selected items so as to minimize the posterior variance of the ability estimate distribution, while the maximum likelihood tailored testing…
Descriptors: Academic Ability, Adaptive Testing, Bayesian Statistics, Comparative Analysis
Steinheiser, Frederick H., Jr.; And Others – 1978
Alternative mathematical models for scoring and decision making with criterion referenced tests are described, especially as they concern appropriate test length and methods of establishing statistically valid cutting scores. Several of these approaches are reviewed and compared on formal-analytic and empirical grounds: (1) Block's approach to…
Descriptors: Comparative Analysis, Criterion Referenced Tests, Cutting Scores, Decision Making
Haladyna, Tom; Roid, Gale – 1980
The problems associated with misclassifying students when pass-fail decisions are based on test scores are discussed. One protection against misclassification is to set a confidence interval around the cutting score. Those whose scores fall above the interval are passed; those whose scores fall below the interval are failed; and those whose scores…
Descriptors: Bayesian Statistics, Classification, Comparative Analysis, Criterion Referenced Tests
Hutten, Leah R. – 1979
Goodness of fit of raw test score data were compared, using two latent trait models: the Rasch model and the Birnbaum three-parameter logistic model. Data were taken from various achievement tests and the Scholastic Aptitude Test (Verbal). A minimum sample size of 1,000 was required, and the minimum test length was 40 items. Results indicated that…
Descriptors: Ability Identification, Achievement Tests, College Entrance Examinations, Comparative Analysis
Frick, Theodore W. – 1986
The sequential probability ratio test (SPRT), developed by Abraham Wald, is one statistical model available for making mastery decisions during computer-based criterion referenced tests. The predictive validity of the SPRT was empirically investigated with two different and relatively large item pools with heterogeneous item parameters. Graduate…
Descriptors: Achievement Tests, Adaptive Testing, Classification, Comparative Analysis