NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Showing all 14 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025
Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…
Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Lim, Hwanggyu; Choe, Edison M. – Journal of Educational Measurement, 2023
The residual differential item functioning (RDIF) detection framework was developed recently under a linear testing context. To explore the potential application of this framework to computerized adaptive testing (CAT), the present study investigated the utility of the RDIF[subscript R] statistic both as an index for detecting uniform DIF of…
Descriptors: Test Items, Computer Assisted Testing, Item Response Theory, Adaptive Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Schneider, Johannes; Richner, Robin; Riser, Micha – International Journal of Artificial Intelligence in Education, 2023
Autograding short textual answers has become much more feasible due to the rise of NLP and the increased availability of question-answer pairs brought about by a shift to online education. Autograding performance is still inferior to human grading. The statistical and black-box nature of state-of-the-art machine learning models makes them…
Descriptors: Grading, Natural Language Processing, Computer Assisted Testing, Ethics
Yi Gui – ProQuest LLC, 2024
This study explores using transfer learning in machine learning for natural language processing (NLP) to create generic automated essay scoring (AES) models, providing instant online scoring for statewide writing assessments in K-12 education. The goal is to develop an instant online scorer that is generalizable to any prompt, addressing the…
Descriptors: Writing Tests, Natural Language Processing, Writing Evaluation, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Reinertsen, Nathanael – English in Australia, 2018
The difference in how humans read and how Automated Essay Scoring (AES) systems process written language leads to a situation where a portion of student responses will be comprehensible to human markers, but unable to be parsed by AES systems. This paper examines a number of pieces of student writing that were marked by trained human markers, but…
Descriptors: Qualitative Research, Writing Evaluation, Essay Tests, Computer Assisted Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Dalton, Sarah Grace; Stark, Brielle C.; Fromm, Davida; Apple, Kristen; MacWhinney, Brian; Rensch, Amanda; Rowedder, Madyson – Journal of Speech, Language, and Hearing Research, 2022
Purpose: The aim of this study was to advance the use of structured, monologic discourse analysis by validating an automated scoring procedure for core lexicon (CoreLex) using transcripts. Method: Forty-nine transcripts from persons with aphasia and 48 transcripts from persons with no brain injury were retrieved from the AphasiaBank database. Five…
Descriptors: Validity, Discourse Analysis, Databases, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Evans, William S.; Cavanaugh, Robert; Quique, Yina; Boss, Emily; Starns, Jeffrey J.; Hula, William D. – Journal of Speech, Language, and Hearing Research, 2021
Purpose: The purpose of this study was to develop and pilot a novel treatment framework called "BEARS" (Balancing Effort, Accuracy, and Response Speed). People with aphasia (PWA) have been shown to maladaptively balance speed and accuracy during language tasks. BEARS is designed to train PWA to balance speed-accuracy trade-offs and…
Descriptors: Accuracy, Semantics, Aphasia, Reaction Time
Peer reviewed Peer reviewed
Direct linkDirect link
Jiao, Yishan; LaCross, Amy; Berisha, Visar; Liss, Julie – Journal of Speech, Language, and Hearing Research, 2019
Purpose: Subjective speech intelligibility assessment is often preferred over more objective approaches that rely on transcript scoring. This is, in part, because of the intensive manual labor associated with extracting objective metrics from transcribed speech. In this study, we propose an automated approach for scoring transcripts that provides…
Descriptors: Suprasegmentals, Phonemes, Error Patterns, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Mao, Liyang; Liu, Ou Lydia; Roohr, Katrina; Belur, Vinetha; Mulholland, Matthew; Lee, Hee-Sun; Pallant, Amy – Educational Assessment, 2018
Scientific argumentation is one of the core practices for teachers to implement in science classrooms. We developed a computer-based formative assessment to support students' construction and revision of scientific arguments. The assessment is built upon automated scoring of students' arguments and provides feedback to students and teachers.…
Descriptors: Computer Assisted Testing, Science Tests, Scoring, Automation
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Li, Zhi; Feng, Hui-Hsien; Saricaoglu, Aysel – CALICO Journal, 2017
This classroom-based study employs a mixed-methods approach to exploring both short-term and long-term effects of Criterion feedback on ESL students' development of grammatical accuracy. The results of multilevel growth modeling indicate that Criterion feedback helps students in both intermediate-high and advanced-low levels reduce errors in eight…
Descriptors: Feedback (Response), Criterion Referenced Tests, Computer Assisted Testing, Writing Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Jackson, Margaret C.; Linden, David E. J.; Roberts, Mark V.; Kriegeskorte, Nikolaus; Haenschel, Corinna – Journal of Experimental Psychology: Learning, Memory, and Cognition, 2015
A number of studies have shown that visual working memory (WM) is poorer for complex versus simple items, traditionally accounted for by higher information load placing greater demands on encoding and storage capacity limits. Other research suggests that it may not be complexity that determines WM performance per se, but rather increased…
Descriptors: Visual Perception, Short Term Memory, Test Items, Cognitive Processes
Peer reviewed Peer reviewed
Direct linkDirect link
Hauser, Carl; Thum, Yeow Meng; He, Wei; Ma, Lingling – Educational and Psychological Measurement, 2015
When conducting item reviews, analysts evaluate an array of statistical and graphical information to assess the fit of a field test (FT) item to an item response theory model. The process can be tedious, particularly when the number of human reviews (HR) to be completed is large. Furthermore, such a process leads to decisions that are susceptible…
Descriptors: Test Items, Item Response Theory, Research Methodology, Decision Making
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Jordan, Sally – European Journal of Science and Mathematics Education, 2014
Inspection of thousands of student responses to computer-marked assessment questions has brought insight into the errors made by adult distance learners of science. Most of the questions analysed were in summative use and required students to construct their own response. Both of these things increased confidence in the reliability of the…
Descriptors: Foreign Countries, Undergraduate Students, College Science, Science Education
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Boyer, Kristy Elizabeth, Ed.; Yudelson, Michael, Ed. – International Educational Data Mining Society, 2018
The 11th International Conference on Educational Data Mining (EDM 2018) is held under the auspices of the International Educational Data Mining Society at the Templeton Landing in Buffalo, New York. This year's EDM conference was highly competitive, with 145 long and short paper submissions. Of these, 23 were accepted as full papers and 37…
Descriptors: Data Collection, Data Analysis, Computer Science Education, Program Proposals