ERIC - Search Results

Publication Date

In 2025	2
Since 2024	2
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	6

Descriptor

Computer Assisted Testing	14
Scoring	14
Adaptive Testing	4
Automation	4
Item Response Theory	4
Test Construction	4
Test Items	4
Comparative Analysis	3
Simulation	3
Test Scoring Machines	3
Algorithms	2
Artificial Intelligence	2
Computer Software	2
Design	2
Elementary Secondary Education	2
Evaluation Methods	2
Evaluators	2
High Stakes Tests	2
Mathematical Models	2
Scores	2
Validity	2
Accuracy	1
Advanced Placement	1
Alternative Assessment	1
Architects	1
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	14
Reports - Research	7
Reports - Evaluative	6
Book/Product Reviews	1
Speeches/Meeting Papers	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Advanced Placement…

What Works Clearinghouse Rating

Showing all 14 results Save | Export

The Vulnerability of AI-Based Scoring Systems to Gaming Strategies: A Case Study

Peer reviewed

Direct link

Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025

Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…

Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy

Online Calibration in Multidimensional Computerized Adaptive Testing with Polytomously Scored Items

Peer reviewed

Direct link

Yuan, Lu; Huang, Yingshi; Li, Shuhang; Chen, Ping – Journal of Educational Measurement, 2023

Online calibration is a key technology for item calibration in computerized adaptive testing (CAT) and has been widely used in various forms of CAT, including unidimensional CAT, multidimensional CAT (MCAT), CAT with polytomously scored items, and cognitive diagnostic CAT. However, as multidimensional and polytomous assessment data become more…

Descriptors: Computer Assisted Testing, Adaptive Testing, Computation, Test Items

Evaluating the Consistency and Reliability of Attribution Methods in Automated Short Answer Grading (ASAG) Systems: Toward an Explainable Scoring System

Peer reviewed

Direct link

Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025

In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…

Descriptors: Automation, Grading, Computer Assisted Testing, Scoring

Using Linkage Sets to Improve Connectedness in Rater Response Model Estimation

Peer reviewed

Direct link

Casabianca, Jodi M.; Donoghue, John R.; Shin, Hyo Jeong; Chao, Szu-Fu; Choi, Ikkyu – Journal of Educational Measurement, 2023

Using item-response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular rating designs used in large-scale testing scenarios,…

Descriptors: Item Response Theory, Alternative Assessment, Evaluators, Research Problems

Validity Arguments Meet Artificial Intelligence in Innovative Educational Assessment

Peer reviewed

Direct link

Dorsey, David W.; Michaels, Hillary R. – Journal of Educational Measurement, 2022

We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement--one that has captured our collective interest and imagination. Scientists and practitioners within the domains…

Descriptors: Validity, Ethics, Artificial Intelligence, Evaluation Methods

The Impact of Anonymization for Automated Essay Scoring

Peer reviewed

Direct link

Shermis, Mark D.; Lottridge, Sue; Mayfield, Elijah – Journal of Educational Measurement, 2015

This study investigated the impact of anonymizing text on predicted scores made by two kinds of automated scoring engines: one that incorporates elements of natural language processing (NLP) and one that does not. Eight data sets (N = 22,029) were used to form both training and test sets in which the scoring engines had access to both text and…

Descriptors: Scoring, Essays, Computer Assisted Testing, Natural Language Processing

Developing and Scoring an Innovative Computerized Writing Assessment.

Peer reviewed

Davey, Tim; And Others – Journal of Educational Measurement, 1997

The development and scoring of a recently introduced computer-based writing skills test is described. The test asks the examinee to edit a writing passage presented on a computer screen. Scoring difficulties are addressed through the combined use of option weighting and the sequential probability ratio test. (SLD)

Descriptors: Computer Assisted Testing, Educational Innovation, Probability, Scoring

"Mental Model" Comparison of Automated and Human Scoring.

Peer reviewed

Williamson, David M.; Bejar, Isaac I.; Hone, Anne S. – Journal of Educational Measurement, 1999

Contrasts "mental models" used by automated scoring for the simulation division of the computerized Architect Registration Examination with those used by experienced human graders for 3,613 candidate solutions. Discusses differences in the models used and the potential of automated scoring to enhance the validity evidence of scores. (SLD)

Descriptors: Architects, Comparative Analysis, Computer Assisted Testing, Judges

Trace Lines for Testlets: A Use of Multiple-Categorical-Response Models.

Peer reviewed

Thissen, David; And Others – Journal of Educational Measurement, 1989

An approach to scoring reading comprehension based on the concept of the testlet is described, using models developed for items in multiple categories. The model is illustrated using data from 3,866 examinees. Application of testlet scoring to multiple category models developed for individual items is discussed. (SLD)

Descriptors: Adaptive Testing, Computer Assisted Testing, Item Response Theory, Mathematical Models

Development of Automated Scoring Algorithms for Complex Performance Assessments: A Comparison of Two Approaches.

Peer reviewed

Clauser, Brian E.; Margolis, Melissa J.; Clyman, Stephen G.; Ross, Linette P. – Journal of Educational Measurement, 1997

Research on automated scoring is extended by comparing alternative automated systems for scoring a computer simulation of physicians' patient management skills. A regression-based system is more highly correlated with experts' evaluations than a system that uses complex rules to map performances into score levels, but both approaches are feasible.…

Descriptors: Algorithms, Automation, Comparative Analysis, Computer Assisted Testing

Evaluating an Automatically Scorable, Open-Ended Response Type for Measuring Mathematical Reasoning in Computer-Adaptive Tests.

Peer reviewed

Bennett, Randy Elliot; Steffen, Manfred; Singley, Mark Kevin; Morley, Mary; Jacquemin, Daniel – Journal of Educational Measurement, 1997

Scoring accuracy and item functioning were studied for an open-ended response type test in which correct answers can take many different surface forms. Results with 1,864 graduate school applicants showed automated scoring to approximate the accuracy of multiple-choice scoring. Items functioned similarly to other item types being considered. (SLD)

Descriptors: Adaptive Testing, Automation, College Applicants, Computer Assisted Testing

Toward a Psychometrics for Testlets.

Peer reviewed

Wainer, Howard; Lewis, Charles – Journal of Educational Measurement, 1990

Three different applications of the testlet concept are presented, and the psychometric models most suitable for each application are described. Difficulties that testlets can help overcome include (1) context effects; (2) item ordering; and (3) content balancing. Implications for test construction are discussed. (SLD)

Descriptors: Algorithms, Computer Assisted Testing, Elementary Secondary Education, Item Response Theory

MicroCAT Testing System Version 3.0.

Peer reviewed

Patience, Wayne – Journal of Educational Measurement, 1990

The four main subsystems of the MicroCAT Testing System for developing, administering, scoring, and analyzing computerized tests using conventional or item response theory methods are described. Judgments of three users of the system are included in the evaluation of this software. (SLD)

Descriptors: Adaptive Testing, Computer Assisted Testing, Computer Software, Computer Software Reviews

Scoring Constructed Responses Using Expert Systems.

Peer reviewed

Braun, Henry I.; And Others – Journal of Educational Measurement, 1990

The accuracy with which expert systems (ESs) score a new nonmultiple-choice free-response test item was investigated, using 734 high school students who were administered an advanced-placement computer science examination. ESs produced scores for 82 percent to 95 percent of the responses and displayed high agreement with a human reader on the…

Descriptors: Advanced Placement, Computer Assisted Testing, Computer Science, Constructed Response

Alex J. Mechaber	1
Bejar, Isaac I.	1
Bennett, Randy Elliot	1
Braun, Henry I.	1
Brian E. Clauser	1
Casabianca, Jodi M.	1
Chao, Szu-Fu	1
Chen, Ping	1
Choi, Ikkyu	1
Clauser, Brian E.	1
Clyman, Stephen G.	1
Davey, Tim	1
Donoghue, John R.	1
Dorsey, David W.	1
Hone, Anne S.	1
Huang, Yingshi	1
Jacquemin, Daniel	1
Jinnie Shin	1
Kai North	1
Le An Ha	1
Lewis, Charles	1
Li, Shuhang	1
Lottridge, Sue	1
Margolis, Melissa J.	1
More ▼