Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 4 |
Descriptor
Source
Journal of Educational… | 13 |
Author
Bennett, Randy Elliot | 3 |
Rock, Donald A. | 2 |
A. Corinne Huggins-Manley | 1 |
Bejar, Isaac I. | 1 |
Bell, John F. | 1 |
Brandon M. Booth | 1 |
Chang, Hua-Hua | 1 |
Dorsey, David W. | 1 |
Douglas, Jeff | 1 |
Enright, Mary K. | 1 |
Hamid Mohammadi | 1 |
More ▼ |
Publication Type
Journal Articles | 13 |
Reports - Research | 8 |
Reports - Evaluative | 4 |
Information Analyses | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Researchers | 1 |
Location
United Kingdom | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Graduate Record Examinations | 1 |
What Works Clearinghouse Rating
Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025
The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…
Descriptors: College Students, Slavic Languages, German, Italian
A. Corinne Huggins-Manley; Brandon M. Booth; Sidney K. D'Mello – Journal of Educational Measurement, 2022
The field of educational measurement places validity and fairness as central concepts of assessment quality. Prior research has proposed embedding fairness arguments within argument-based validity processes, particularly when fairness is conceived as comparability in assessment properties across groups. However, we argue that a more flexible…
Descriptors: Educational Assessment, Persuasive Discourse, Validity, Artificial Intelligence
Dorsey, David W.; Michaels, Hillary R. – Journal of Educational Measurement, 2022
We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement--one that has captured our collective interest and imagination. Scientists and practitioners within the domains…
Descriptors: Validity, Ethics, Artificial Intelligence, Evaluation Methods
Wang, Shiyu; Lin, Haiyan; Chang, Hua-Hua; Douglas, Jeff – Journal of Educational Measurement, 2016
Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Sequential Approach

Williamson, David M.; Bejar, Isaac I.; Hone, Anne S. – Journal of Educational Measurement, 1999
Contrasts "mental models" used by automated scoring for the simulation division of the computerized Architect Registration Examination with those used by experienced human graders for 3,613 candidate solutions. Discusses differences in the models used and the potential of automated scoring to enhance the validity evidence of scores. (SLD)
Descriptors: Architects, Comparative Analysis, Computer Assisted Testing, Judges

Wang, Tianyou; Kolen, Michael J. – Journal of Educational Measurement, 2001
Reviews research literature on comparability issues in computerized adaptive testing (CAT) and synthesizes issues specific to comparability and test security. Develops a framework for evaluating comparability that contains three categories of criteria: (1) validity; (2) psychometric property/reliability; and (3) statistical assumption/test…
Descriptors: Adaptive Testing, Comparative Analysis, Computer Assisted Testing, Criteria

Bennett, Randy Elliot; Rock, Donald A. – Journal of Educational Measurement, 1995
Examined the generalizability and validity and examinee perceptions of a computer-delivered version of 8 formulating-hypotheses tasks administered to 192 graduate students. Results support previous research that has suggested that formulating-hypotheses items can broaden the abilities measured by graduate admissions measures. (SLD)
Descriptors: Admission (School), College Entrance Examinations, Computer Assisted Testing, Generalizability Theory

Enright, Mary K.; Rock, Donald A.; Bennett, Randy Elliot – Journal of Educational Measurement, 1998
Examined alternative-item types and section configurations for improving the discriminant and convergent validity of the Graduate Record Examination (GRE) general test using a computer-based test given to 388 examinees who had taken the GRE previously. Adding new variations of logical meaning appeared to decrease discriminant validity. (SLD)
Descriptors: Admission (School), College Entrance Examinations, College Students, Computer Assisted Testing

Tatsuoka, Kikumi K.; Tatsuoka, Maurice M. – Journal of Educational Measurement, 1997
Results of studies involving 478 junior high school students in two years using cognitive diagnoses done through computerized adaptive testing indicate that knowing students' knowledge states before remediation is effective, and that the rule-space method can diagnose these knowledge states effectively. (SLD)
Descriptors: Adaptive Testing, Cognitive Tests, Computer Assisted Testing, Diagnostic Tests

Wainer, Howard; And Others – Journal of Educational Measurement, 1992
Computer simulations were run to measure the relationship between testlet validity and factors of item pool size and testlet length for both adaptive and linearly constructed testlets. Making a testlet adaptive yields only modest increases in aggregate validity because of the peakedness of the typical proficiency distribution. (Author/SLD)
Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Computer Simulation

Vispoel, Walter P.; And Others – Journal of Educational Measurement, 1997
Efficiency, precision, and concurrent validity of results from adaptive and fixed-item music listening tests were studied using: (1) 2,200 simulated examinees; (2) 204 live examinees; and (3) 172 live examinees. Results support the usefulness of adaptive tests for measuring skills that require aurally produced items. (SLD)
Descriptors: Adaptive Testing, Adults, College Students, Comparative Analysis

Bennett, Randy Elliot; Sebrechts, Marc M. – Journal of Educational Measurement, 1997
A computer-delivered problem-solving task based on cognitive research literature was developed and its validity for graduate admissions assessment was studied with 107 undergraduates. Use of the test, which asked examinees to sort word-problem stems by prototypes, was supported by the findings. (SLD)
Descriptors: Admission (School), College Entrance Examinations, Computer Assisted Testing, Graduate Study

Johnson, Sandra; Bell, John F. – Journal of Educational Measurement, 1985
The assessment framework underlying a science performance monitoring program is process-oriented and intended to appeal to generalizability theory for a suitable estimation paradigm. Preliminary applications are described. Results suggest that computerized question-banking, domain-sampling of questions, and generalizablity theory together provide…
Descriptors: Academic Achievement, Computer Assisted Testing, Educational Assessment, Foreign Countries