Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 6 |
Descriptor
Test Reliability | 10 |
Simulation | 8 |
Test Items | 4 |
Accuracy | 3 |
Comparative Analysis | 3 |
Computation | 3 |
Test Validity | 3 |
Computer Simulation | 2 |
Correlation | 2 |
Diagnostic Tests | 2 |
Difficulty Level | 2 |
More ▼ |
Source
Journal of Educational… | 10 |
Author
Beuchert, A. Kent | 1 |
Birenbaum, Menucha | 1 |
Chang, Hua-Hua | 1 |
Chen, Ping | 1 |
Clark, Amy K. | 1 |
Ding, Shuliang | 1 |
Douglas, Jeff | 1 |
Frary, Robert B. | 1 |
Hoover, Jeffrey C. | 1 |
Jin, Kuan-Yu | 1 |
Lin, Haiyan | 1 |
More ▼ |
Publication Type
Journal Articles | 10 |
Reports - Research | 9 |
Reports - Evaluative | 1 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Thompson, W. Jake; Nash, Brooke; Clark, Amy K.; Hoover, Jeffrey C. – Journal of Educational Measurement, 2023
As diagnostic classification models become more widely used in large-scale operational assessments, we must give consideration to the methods for estimating and reporting reliability. Researchers must explore alternatives to traditional reliability methods that are consistent with the design, scoring, and reporting levels of diagnostic assessment…
Descriptors: Diagnostic Tests, Simulation, Test Reliability, Accuracy
van der Palm, DaniĆ«l W.; van der Ark, L. Andries; Sijtsma, Klaas – Journal of Educational Measurement, 2014
The latent class reliability coefficient (LCRC) is improved by using the divisive latent class model instead of the unrestricted latent class model. This results in the divisive latent class reliability coefficient (DLCRC), which unlike LCRC avoids making subjective decisions about the best solution and thus avoids judgment error. A computational…
Descriptors: Test Reliability, Scores, Computation, Simulation
Wang, Wenyi; Song, Lihong; Chen, Ping; Meng, Yaru; Ding, Shuliang – Journal of Educational Measurement, 2015
Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern-level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet…
Descriptors: Classification, Reliability, Accuracy, Cognitive Tests
Wang, Shiyu; Lin, Haiyan; Chang, Hua-Hua; Douglas, Jeff – Journal of Educational Measurement, 2016
Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Sequential Approach
Schroeders, Ulrich; Robitzsch, Alexander; Schipolowski, Stefan – Journal of Educational Measurement, 2014
C-tests are a specific variant of cloze tests that are considered time-efficient, valid indicators of general language proficiency. They are commonly analyzed with models of item response theory assuming local item independence. In this article we estimated local interdependencies for 12 C-tests and compared the changes in item difficulties,…
Descriptors: Comparative Analysis, Psychometrics, Cloze Procedure, Language Tests
Jin, Kuan-Yu; Wang, Wen-Chung – Journal of Educational Measurement, 2014
Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
Descriptors: Student Evaluation, Item Response Theory, Models, Simulation

Beuchert, A. Kent; Mendoza, Jorge L. – Journal of Educational Measurement, 1979
Ten item discrimination indices, across a variety of item analysis situations, were compared, based on the validities of tests constructed by using each of the indices to select 40 items from a 100-item pool. Item score data were generated by a computer program and included a simulation of guessing. (Author/CTM)
Descriptors: Item Analysis, Simulation, Statistical Analysis, Test Construction

Rudner, Lawrence M. – Journal of Educational Measurement, 1983
Nine indices for assessing the accuracy of an individual's test score were evaluated using simulated item responses to a commercial and a classroom test. The indices appear capable of identifying relatively high proportions of examinees with spurious total scores. (Author/PN)
Descriptors: Correlation, Item Analysis, Latent Trait Theory, Measurement Techniques

Frary, Robert B. – Journal of Educational Measurement, 1985
Responses to a sample test were simulated for examinees under free-response and multiple-choice formats. Test score sets were correlated with randomly generated sets of unit-normal measures. The extent of superiority of free response tests was sufficiently small so that other considerations might justifiably dictate format choice. (Author/DWH)
Descriptors: Comparative Analysis, Computer Simulation, Essay Tests, Guessing (Tests)

Birenbaum, Menucha; Tatsuoka, Kikumi – Journal of Educational Measurement, 1982
Empirical results from two studies--a simulation study and an experimental one--indicated that, in achievement data of the problem-solving type where a specific subject matter area is being tested, the greater the variety of the algorithms used, the higher the dimensionality of the test data. (Author/PN)
Descriptors: Achievement Tests, Algorithms, Data Analysis, Factor Structure