Publication Date
In 2025 | 4 |
Since 2024 | 5 |
Since 2021 (last 5 years) | 14 |
Since 2016 (last 10 years) | 24 |
Since 2006 (last 20 years) | 34 |
Descriptor
Computer Assisted Testing | 65 |
Test Items | 32 |
Adaptive Testing | 30 |
Item Response Theory | 20 |
Simulation | 16 |
Comparative Analysis | 14 |
Test Construction | 14 |
Higher Education | 13 |
Item Banks | 12 |
College Students | 11 |
Scores | 11 |
More ▼ |
Source
Journal of Educational… | 65 |
Author
Vispoel, Walter P. | 5 |
Rock, Donald A. | 4 |
Bennett, Randy Elliot | 3 |
Bridgeman, Brent | 3 |
Chang, Hua-Hua | 3 |
Tatsuoka, Kikumi K. | 3 |
Bleiler, Timothy | 2 |
Cai, Yan | 2 |
Choi, Ikkyu | 2 |
Choi, Seung W. | 2 |
Kim, Dong-In | 2 |
More ▼ |
Publication Type
Journal Articles | 65 |
Reports - Research | 65 |
Speeches/Meeting Papers | 4 |
Education Level
Elementary Education | 1 |
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Researchers | 1 |
Location
United Kingdom | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Graduate Record Examinations | 4 |
Indiana Statewide Testing for… | 2 |
Advanced Placement… | 1 |
Law School Admission Test | 1 |
What Works Clearinghouse Rating
He, Yinhong; Qi, Yuanyuan – Journal of Educational Measurement, 2023
In multidimensional computerized adaptive testing (MCAT), item selection strategies are generally constructed based on responses, and they do not consider the response times required by items. This study constructed two new criteria (referred to as DT-inc and DT) for MCAT item selection by utilizing information from response times. The new designs…
Descriptors: Reaction Time, Adaptive Testing, Computer Assisted Testing, Test Items
Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025
Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…
Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy
Kylie Gorney; Mark D. Reckase – Journal of Educational Measurement, 2025
In computerized adaptive testing, item exposure control methods are often used to provide a more balanced usage of the item pool. Many of the most popular methods, including the restricted method (Revuelta and Ponsoda), use a single maximum exposure rate to limit the proportion of times that each item is administered. However, Barrada et al.…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Item Banks
Ersen, Rabia Karatoprak; Lee, Won-Chan – Journal of Educational Measurement, 2023
The purpose of this study was to compare calibration and linking methods for placing pretest item parameter estimates on the item pool scale in a 1-3 computerized multistage adaptive testing design in terms of item parameter recovery. Two models were used: embedded-section, in which pretest items were administered within a separate module, and…
Descriptors: Pretesting, Test Items, Computer Assisted Testing, Adaptive Testing
Yuan, Lu; Huang, Yingshi; Li, Shuhang; Chen, Ping – Journal of Educational Measurement, 2023
Online calibration is a key technology for item calibration in computerized adaptive testing (CAT) and has been widely used in various forms of CAT, including unidimensional CAT, multidimensional CAT (MCAT), CAT with polytomously scored items, and cognitive diagnostic CAT. However, as multidimensional and polytomous assessment data become more…
Descriptors: Computer Assisted Testing, Adaptive Testing, Computation, Test Items
Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025
In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…
Descriptors: Automation, Grading, Computer Assisted Testing, Scoring
Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025
The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…
Descriptors: College Students, Slavic Languages, German, Italian
Casabianca, Jodi M.; Donoghue, John R.; Shin, Hyo Jeong; Chao, Szu-Fu; Choi, Ikkyu – Journal of Educational Measurement, 2023
Using item-response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular rating designs used in large-scale testing scenarios,…
Descriptors: Item Response Theory, Alternative Assessment, Evaluators, Research Problems
Jones, Paul; Tong, Ye; Liu, Jinghua; Borglum, Joshua; Primoli, Vince – Journal of Educational Measurement, 2022
This article studied two methods to detect mode effects in two credentialing exams. In Study 1, we used a "modal scale comparison approach," where the same pool of items was calibrated separately, without transformation, within two TC cohorts (TC1 and TC2) and one OP cohort (OP1) matched on their pool-based scale score distributions. The…
Descriptors: Scores, Credentials, Licensing Examinations (Professions), Computer Assisted Testing
Xu, Lingling; Wang, Shiyu; Cai, Yan; Tu, Dongbo – Journal of Educational Measurement, 2021
Designing a multidimensional adaptive test (M-MST) based on a multidimensional item response theory (MIRT) model is critical to make full use of the advantages of both MST and MIRT in implementing multidimensional assessments. This study proposed two types of automated test assembly (ATA) algorithms and one set of routing rules that can facilitate…
Descriptors: Item Response Theory, Adaptive Testing, Automation, Test Construction
Yang Jiang; Mo Zhang; Jiangang Hao; Paul Deane; Chen Li – Journal of Educational Measurement, 2024
The emergence of sophisticated AI tools such as ChatGPT, coupled with the transition to remote delivery of educational assessments in the COVID-19 era, has led to increasing concerns about academic integrity and test security. Using AI tools, test takers can produce high-quality texts effortlessly and use them to game assessments. It is thus…
Descriptors: Integrity, Artificial Intelligence, Technology Uses in Education, Ethics
Bengs, Daniel; Kroehne, Ulf; Brefeld, Ulf – Journal of Educational Measurement, 2021
By tailoring test forms to the test-taker's proficiency, Computerized Adaptive Testing (CAT) enables substantial increases in testing efficiency over fixed forms testing. When used for formative assessment, the alignment of task difficulty with proficiency increases the chance that teachers can derive useful feedback from assessment data. The…
Descriptors: Computer Assisted Testing, Formative Evaluation, Group Testing, Program Effectiveness
A. Corinne Huggins-Manley; Brandon M. Booth; Sidney K. D'Mello – Journal of Educational Measurement, 2022
The field of educational measurement places validity and fairness as central concepts of assessment quality. Prior research has proposed embedding fairness arguments within argument-based validity processes, particularly when fairness is conceived as comparability in assessment properties across groups. However, we argue that a more flexible…
Descriptors: Educational Assessment, Persuasive Discourse, Validity, Artificial Intelligence
Wyse, Adam E.; McBride, James R. – Journal of Educational Measurement, 2021
A key consideration when giving any computerized adaptive test (CAT) is how much adaptation is present when the test is used in practice. This study introduces a new framework to measure the amount of adaptation of Rasch-based CATs based on looking at the differences between the selected item locations (Rasch item difficulty parameters) of the…
Descriptors: Item Response Theory, Computer Assisted Testing, Adaptive Testing, Test Items
Chen, Chia-Wen; Wang, Wen-Chung; Chiu, Ming Ming; Ro, Sage – Journal of Educational Measurement, 2020
The use of computerized adaptive testing algorithms for ranking items (e.g., college preferences, career choices) involves two major challenges: unacceptably high computation times (selecting from a large item pool with many dimensions) and biased results (enhanced preferences or intensified examinee responses because of repeated statements across…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection