NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 4 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025
Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…
Descriptors: Tests, Testing, Scores, Test Construction
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Park, Siwon – Journal of Pan-Pacific Association of Applied Linguistics, 2017
This paper examines how different test methods may tap different aspects of second language knowledge. It employs multiple-choice (MC) and constructed response (CR) items which yield distinct or convergent information in the computer delivered testing of English in its presentation of this factor. In order to examine the effects of test method, a…
Descriptors: Evaluation Methods, Second Language Learning, English (Second Language), Computer Assisted Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Dik, Bryan J.; Eldridge, Brandy M.; Steger, Michael F.; Duffy, Ryan D. – Journal of Career Assessment, 2012
Research on work as a calling is limited by measurement concerns. In response, the authors introduce the multidimensional Calling and Vocation Questionnaire (CVQ) and the Brief Calling scale (BCS), instruments assessing presence of, and search for, a calling. Study 1 describes CVQ development using exploratory and confirmatory factor analysis…
Descriptors: Multitrait Multimethod Techniques, Construct Validity, Validity, Test Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Kong, Xiaojing J.; Wise, Steven L.; Bhola, Dennison S. – Educational and Psychological Measurement, 2007
This study compared four methods for setting item response time thresholds to differentiate rapid-guessing behavior from solution behavior. Thresholds were either (a) common for all test items, (b) based on item surface features such as the amount of reading required, (c) based on visually inspecting response time frequency distributions, or (d)…
Descriptors: Test Items, Reaction Time, Timed Tests, Item Response Theory