NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 4 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Slepkov, Aaron D.; Godfrey, Alan T. K. – Applied Measurement in Education, 2019
The answer-until-correct (AUC) method of multiple-choice (MC) testing involves test respondents making selections until the keyed answer is identified. Despite attendant benefits that include improved learning, broad student adoption, and facile administration of partial credit, the use of AUC methods for classroom testing has been extremely…
Descriptors: Multiple Choice Tests, Test Items, Test Reliability, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Kahraman, Nilufer; Brown, Crystal B. – Applied Measurement in Education, 2015
Psychometric models based on structural equation modeling framework are commonly used in many multiple-choice test settings to assess measurement invariance of test items across examinee subpopulations. The premise of the current article is that they may also be useful in the context of performance assessment tests to test measurement invariance…
Descriptors: Factor Analysis, Structural Equation Models, Medical Students, Performance Based Assessment
Peer reviewed Peer reviewed
Direct linkDirect link
Eckes, Thomas; Baghaei, Purya – Applied Measurement in Education, 2015
C-tests are gap-filling tests widely used to assess general language proficiency for purposes of placement, screening, or provision of feedback to language learners. C-tests consist of several short texts in which parts of words are missing. We addressed the issue of local dependence in C-tests using an explicit modeling approach based on testlet…
Descriptors: Language Proficiency, Language Tests, Item Response Theory, Test Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Powers, Donald E.; Escoffery, David S.; Duchnowski, Matthew P. – Applied Measurement in Education, 2015
By far, the most frequently used method of validating (the interpretation and use of) automated essay scores has been to compare them with scores awarded by human raters. Although this practice is questionable, human-machine agreement is still often regarded as the "gold standard." Our objective was to refine this model and apply it to…
Descriptors: Essays, Test Scoring Machines, Program Validation, Criterion Referenced Tests