NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 14 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Mingfeng Xue; Ping Chen – Journal of Educational Measurement, 2025
Response styles pose great threats to psychological measurements. This research compares IRTree models and anchoring vignettes in addressing response styles and estimating the target traits. It also explores the potential of combining them at the item level and total-score level (ratios of extreme and middle responses to vignettes). Four models…
Descriptors: Item Response Theory, Models, Comparative Analysis, Vignettes
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Wenyi; Song, Lihong; Chen, Ping; Meng, Yaru; Ding, Shuliang – Journal of Educational Measurement, 2015
Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern-level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet…
Descriptors: Classification, Reliability, Accuracy, Cognitive Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Schroeders, Ulrich; Robitzsch, Alexander; Schipolowski, Stefan – Journal of Educational Measurement, 2014
C-tests are a specific variant of cloze tests that are considered time-efficient, valid indicators of general language proficiency. They are commonly analyzed with models of item response theory assuming local item independence. In this article we estimated local interdependencies for 12 C-tests and compared the changes in item difficulties,…
Descriptors: Comparative Analysis, Psychometrics, Cloze Procedure, Language Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sooyeon; von Davier, Alina A.; Haberman, Shelby – Journal of Educational Measurement, 2008
This study addressed the sampling error and linking bias that occur with small samples in a nonequivalent groups anchor test design. We proposed a linking method called the synthetic function, which is a weighted average of the identity function and a traditional equating function (in this case, the chained linear equating function). Specifically,…
Descriptors: Equated Scores, Sample Size, Test Reliability, Comparative Analysis
Peer reviewed Peer reviewed
Lord, Frederic M. – Journal of Educational Measurement, 1974
When comparing two tests that measure the same trait, separate comparisons should be made at different levels of the trait. A simple, practical, approximate formula is given for doing this. The adequacy of the approximation is illustrated using data comparing seven nationally known sixth-grade reading tests. (Author/RC)
Descriptors: Ability Identification, Comparative Analysis, Reading Tests, Statistical Analysis
Peer reviewed Peer reviewed
Crehan, Kevin D. – Journal of Educational Measurement, 1974
Various item selection techniques are compared on criterion-referenced reliability and validity. Techniques compared include three nominal criterion-referenced methods, a traditional point biserial selection, teacher selection, and random selection. (Author)
Descriptors: Comparative Analysis, Criterion Referenced Tests, Item Analysis, Item Banks
Peer reviewed Peer reviewed
Ebel, Robert L. – Journal of Educational Measurement, 1975
Descriptors: Comparative Analysis, Multiple Choice Tests, Objective Tests, Teachers
Peer reviewed Peer reviewed
Huynh, Huynh; Saunders, Joseph C. – Journal of Educational Measurement, 1980
Single administration (beta-binomial) estimates for the raw agreement index p and the corrected-for-chance kappa index in mastery testing are compared with those based on two test administrations in terms of estimation bias and sampling variability. Bias is about 2.5 percent for p and 10 percent for kappa. (Author/RL)
Descriptors: Comparative Analysis, Error of Measurement, Mastery Tests, Mathematical Models
Peer reviewed Peer reviewed
Kolen, Michael J.; Whitney, Douglas R. – Journal of Educational Measurement, 1982
The adequacy of equipercentile, linear, one-parameter (Rasch), and three-parameter logistic item-response theory procedures for equating 12 forms of five tests of general educational development were compared. Results indicated the equating method adequacy depends on a variety of factors such as test characteristics, equating design, and sample…
Descriptors: Achievement Tests, Comparative Analysis, Equated Scores, Equivalency Tests
Peer reviewed Peer reviewed
Frary, Robert B. – Journal of Educational Measurement, 1985
Responses to a sample test were simulated for examinees under free-response and multiple-choice formats. Test score sets were correlated with randomly generated sets of unit-normal measures. The extent of superiority of free response tests was sufficiently small so that other considerations might justifiably dictate format choice. (Author/DWH)
Descriptors: Comparative Analysis, Computer Simulation, Essay Tests, Guessing (Tests)
Peer reviewed Peer reviewed
Frisbie, David A.; Sweeney, Daryl C. – Journal of Educational Measurement, 1982
A 100-item five-choice multiple choice (MC) biology final exam was converted to multiple choice true-false (MTF) form to yield two content-parallel test forms comprised of the two item types. Students found the MTF items easier and preferred MTF over MC; the MTF subtests were more reliable. (Author/GK)
Descriptors: Biology, College Science, Comparative Analysis, Difficulty Level
Peer reviewed Peer reviewed
Marsh, Herbert W. – Journal of Educational Measurement, 1993
Structural equation models of the same construct collected on different occasions are evaluated in 2 studies involving the evaluation of 157 college instructors over 8 years and data for over 2,200 high school students over 4 years for the Youth in Transition Study. Results challenge overreliance on simplex models. (SLD)
Descriptors: College Faculty, Comparative Analysis, High School Students, High Schools
Peer reviewed Peer reviewed
Kansup, Wanlop; Hakstian, A. Ralph – Journal of Educational Measurement, 1975
Effects of logically weighting incorrect item options in conventional tests and different scoring functions with confidence tests on reliability and validity were examined. Ninth graders took conventionally administered Verbal and Mathematical Reasoning tests, scored conventionally and by a procedure assigning degree-of-correctness weights to…
Descriptors: Comparative Analysis, Confidence Testing, Junior High School Students, Multiple Choice Tests
Peer reviewed Peer reviewed
Hakstian, A. Ralph; Kansup, Wanlop – Journal of Educational Measurement, 1975
A comparison of reliability and validity was made for three testing procedures: 1) responding conventionally to Verbal Ability and Mathematical Reasoning tests; 2) using a confidence weighting response procedure with the same tests; and 3) using the elimination response method. The experimental testing procedures were not psychometrically superior…
Descriptors: Comparative Analysis, Confidence Testing, Guessing (Tests), Junior High School Students