NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 13 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025
The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…
Descriptors: College Students, Slavic Languages, German, Italian
Peer reviewed Peer reviewed
Direct linkDirect link
A. Corinne Huggins-Manley; Brandon M. Booth; Sidney K. D'Mello – Journal of Educational Measurement, 2022
The field of educational measurement places validity and fairness as central concepts of assessment quality. Prior research has proposed embedding fairness arguments within argument-based validity processes, particularly when fairness is conceived as comparability in assessment properties across groups. However, we argue that a more flexible…
Descriptors: Educational Assessment, Persuasive Discourse, Validity, Artificial Intelligence
Peer reviewed Peer reviewed
Direct linkDirect link
Dorsey, David W.; Michaels, Hillary R. – Journal of Educational Measurement, 2022
We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement--one that has captured our collective interest and imagination. Scientists and practitioners within the domains…
Descriptors: Validity, Ethics, Artificial Intelligence, Evaluation Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Shiyu; Lin, Haiyan; Chang, Hua-Hua; Douglas, Jeff – Journal of Educational Measurement, 2016
Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Sequential Approach
Peer reviewed Peer reviewed
Williamson, David M.; Bejar, Isaac I.; Hone, Anne S. – Journal of Educational Measurement, 1999
Contrasts "mental models" used by automated scoring for the simulation division of the computerized Architect Registration Examination with those used by experienced human graders for 3,613 candidate solutions. Discusses differences in the models used and the potential of automated scoring to enhance the validity evidence of scores. (SLD)
Descriptors: Architects, Comparative Analysis, Computer Assisted Testing, Judges
Peer reviewed Peer reviewed
Wang, Tianyou; Kolen, Michael J. – Journal of Educational Measurement, 2001
Reviews research literature on comparability issues in computerized adaptive testing (CAT) and synthesizes issues specific to comparability and test security. Develops a framework for evaluating comparability that contains three categories of criteria: (1) validity; (2) psychometric property/reliability; and (3) statistical assumption/test…
Descriptors: Adaptive Testing, Comparative Analysis, Computer Assisted Testing, Criteria
Peer reviewed Peer reviewed
Bennett, Randy Elliot; Rock, Donald A. – Journal of Educational Measurement, 1995
Examined the generalizability and validity and examinee perceptions of a computer-delivered version of 8 formulating-hypotheses tasks administered to 192 graduate students. Results support previous research that has suggested that formulating-hypotheses items can broaden the abilities measured by graduate admissions measures. (SLD)
Descriptors: Admission (School), College Entrance Examinations, Computer Assisted Testing, Generalizability Theory
Peer reviewed Peer reviewed
Enright, Mary K.; Rock, Donald A.; Bennett, Randy Elliot – Journal of Educational Measurement, 1998
Examined alternative-item types and section configurations for improving the discriminant and convergent validity of the Graduate Record Examination (GRE) general test using a computer-based test given to 388 examinees who had taken the GRE previously. Adding new variations of logical meaning appeared to decrease discriminant validity. (SLD)
Descriptors: Admission (School), College Entrance Examinations, College Students, Computer Assisted Testing
Peer reviewed Peer reviewed
Tatsuoka, Kikumi K.; Tatsuoka, Maurice M. – Journal of Educational Measurement, 1997
Results of studies involving 478 junior high school students in two years using cognitive diagnoses done through computerized adaptive testing indicate that knowing students' knowledge states before remediation is effective, and that the rule-space method can diagnose these knowledge states effectively. (SLD)
Descriptors: Adaptive Testing, Cognitive Tests, Computer Assisted Testing, Diagnostic Tests
Peer reviewed Peer reviewed
Wainer, Howard; And Others – Journal of Educational Measurement, 1992
Computer simulations were run to measure the relationship between testlet validity and factors of item pool size and testlet length for both adaptive and linearly constructed testlets. Making a testlet adaptive yields only modest increases in aggregate validity because of the peakedness of the typical proficiency distribution. (Author/SLD)
Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Computer Simulation
Peer reviewed Peer reviewed
Vispoel, Walter P.; And Others – Journal of Educational Measurement, 1997
Efficiency, precision, and concurrent validity of results from adaptive and fixed-item music listening tests were studied using: (1) 2,200 simulated examinees; (2) 204 live examinees; and (3) 172 live examinees. Results support the usefulness of adaptive tests for measuring skills that require aurally produced items. (SLD)
Descriptors: Adaptive Testing, Adults, College Students, Comparative Analysis
Peer reviewed Peer reviewed
Bennett, Randy Elliot; Sebrechts, Marc M. – Journal of Educational Measurement, 1997
A computer-delivered problem-solving task based on cognitive research literature was developed and its validity for graduate admissions assessment was studied with 107 undergraduates. Use of the test, which asked examinees to sort word-problem stems by prototypes, was supported by the findings. (SLD)
Descriptors: Admission (School), College Entrance Examinations, Computer Assisted Testing, Graduate Study
Peer reviewed Peer reviewed
Johnson, Sandra; Bell, John F. – Journal of Educational Measurement, 1985
The assessment framework underlying a science performance monitoring program is process-oriented and intended to appeal to generalizability theory for a suitable estimation paradigm. Preliminary applications are described. Results suggest that computerized question-banking, domain-sampling of questions, and generalizablity theory together provide…
Descriptors: Academic Achievement, Computer Assisted Testing, Educational Assessment, Foreign Countries