ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	6

Descriptor

Test Reliability	10
Simulation	8
Test Items	4
Accuracy	3
Comparative Analysis	3
Computation	3
Test Validity	3
Computer Simulation	2
Correlation	2
Diagnostic Tests	2
Difficulty Level	2
Item Analysis	2
Item Response Theory	2
Scores	2
Statistical Analysis	2
Test Construction	2
Test Format	2
Testing	2
Achievement Tests	1
Adaptive Testing	1
Algorithms	1
Classification	1
Cloze Procedure	1
Cognitive Tests	1
Computer Assisted Testing	1
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	10
Reports - Research	9
Reports - Evaluative	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 10 results Save | Export

Using Simulated Retests to Estimate the Reliability of Diagnostic Assessment Systems

Peer reviewed

Direct link

Thompson, W. Jake; Nash, Brooke; Clark, Amy K.; Hoover, Jeffrey C. – Journal of Educational Measurement, 2023

As diagnostic classification models become more widely used in large-scale operational assessments, we must give consideration to the methods for estimating and reporting reliability. Researchers must explore alternatives to traditional reliability methods that are consistent with the design, scoring, and reporting levels of diagnostic assessment…

Descriptors: Diagnostic Tests, Simulation, Test Reliability, Accuracy

A Flexible Latent Class Approach to Estimating Test-Score Reliability

Peer reviewed

Direct link

van der Palm, Daniël W.; van der Ark, L. Andries; Sijtsma, Klaas – Journal of Educational Measurement, 2014

The latent class reliability coefficient (LCRC) is improved by using the divisive latent class model instead of the unrestricted latent class model. This results in the divisive latent class reliability coefficient (DLCRC), which unlike LCRC avoids making subjective decisions about the best solution and thus avoids judgment error. A computational…

Descriptors: Test Reliability, Scores, Computation, Simulation

Attribute-Level and Pattern-Level Classification Consistency and Accuracy Indices for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Wang, Wenyi; Song, Lihong; Chen, Ping; Meng, Yaru; Ding, Shuliang – Journal of Educational Measurement, 2015

Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern-level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet…

Descriptors: Classification, Reliability, Accuracy, Cognitive Tests

Hybrid Computerized Adaptive Testing: From Group Sequential Design to Fully Sequential Design

Peer reviewed

Direct link

Wang, Shiyu; Lin, Haiyan; Chang, Hua-Hua; Douglas, Jeff – Journal of Educational Measurement, 2016

Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Sequential Approach

A Comparison of Different Psychometric Approaches to Modeling Testlet Structures: An Example with C-Tests

Peer reviewed

Direct link

Schroeders, Ulrich; Robitzsch, Alexander; Schipolowski, Stefan – Journal of Educational Measurement, 2014

C-tests are a specific variant of cloze tests that are considered time-efficient, valid indicators of general language proficiency. They are commonly analyzed with models of item response theory assuming local item independence. In this article we estimated local interdependencies for 12 C-tests and compared the changes in item difficulties,…

Descriptors: Comparative Analysis, Psychometrics, Cloze Procedure, Language Tests

Item Response Theory Models for Performance Decline during Testing

Peer reviewed

Direct link

Jin, Kuan-Yu; Wang, Wen-Chung – Journal of Educational Measurement, 2014

Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…

Descriptors: Student Evaluation, Item Response Theory, Models, Simulation

A Monte Carlo Comparison of Ten Item Discrimination Indices.

Peer reviewed

Beuchert, A. Kent; Mendoza, Jorge L. – Journal of Educational Measurement, 1979

Ten item discrimination indices, across a variety of item analysis situations, were compared, based on the validities of tests constructed by using each of the indices to select 40 items from a 100-item pool. Item score data were generated by a computer program and included a simulation of guessing. (Author/CTM)

Descriptors: Item Analysis, Simulation, Statistical Analysis, Test Construction

Individual Assessment Accuracy.

Peer reviewed

Rudner, Lawrence M. – Journal of Educational Measurement, 1983

Nine indices for assessing the accuracy of an individual's test score were evaluated using simulated item responses to a commercial and a classroom test. The indices appear capable of identifying relatively high proportions of examinees with spurious total scores. (Author/PN)

Descriptors: Correlation, Item Analysis, Latent Trait Theory, Measurement Techniques

Multiple-Choice versus Free-Response: A Simulation Study.

Peer reviewed

Frary, Robert B. – Journal of Educational Measurement, 1985

Responses to a sample test were simulated for examinees under free-response and multiple-choice formats. Test score sets were correlated with randomly generated sets of unit-normal measures. The extent of superiority of free response tests was sufficiently small so that other considerations might justifiably dictate format choice. (Author/DWH)

Descriptors: Comparative Analysis, Computer Simulation, Essay Tests, Guessing (Tests)

On the Dimensionality of Achievement Test Data.

Peer reviewed

Birenbaum, Menucha; Tatsuoka, Kikumi – Journal of Educational Measurement, 1982

Empirical results from two studies--a simulation study and an experimental one--indicated that, in achievement data of the problem-solving type where a specific subject matter area is being tested, the greater the variety of the algorithms used, the higher the dimensionality of the test data. (Author/PN)

Descriptors: Achievement Tests, Algorithms, Data Analysis, Factor Structure

Beuchert, A. Kent	1
Birenbaum, Menucha	1
Chang, Hua-Hua	1
Chen, Ping	1
Clark, Amy K.	1
Ding, Shuliang	1
Douglas, Jeff	1
Frary, Robert B.	1
Hoover, Jeffrey C.	1
Jin, Kuan-Yu	1
Lin, Haiyan	1
Mendoza, Jorge L.	1
Meng, Yaru	1
Nash, Brooke	1
Robitzsch, Alexander	1
Rudner, Lawrence M.	1
Schipolowski, Stefan	1
Schroeders, Ulrich	1
Sijtsma, Klaas	1
Song, Lihong	1
Tatsuoka, Kikumi	1
Thompson, W. Jake	1
Wang, Shiyu	1
Wang, Wen-Chung	1
Wang, Wenyi	1
More ▼