ERIC Number: ED675596
Record Type: Non-Journal
Publication Date: 2025
Pages: 11
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: 0000-00-00
A LLM-Powered Automatic Grading Framework with Human-Level Guidelines Optimization
Yucheng Chu; Hang Li; Kaiqi Yang; Harry Shomer; Yasemin Copur-Gencturk; Leonora Kaldaras; Kevin Haudek; Joseph Krajcik; Namsoo Shin; Hui Liu; Jiliang Tang
International Educational Data Mining Society, Paper presented at the International Conference on Educational Data Mining (EDM) (18th, Palermo, Italy, Jul 20-23, 2025)
Open-text responses provide researchers and educators with rich, nuanced insights that multiple-choice questions cannot capture. When reliably assessed, such responses have the potential to enhance teaching and learning. However, scaling and consistently capturing these nuances remain significant challenges, limiting the widespread use of open-text questions in educational research and assessments. In this paper, we introduce and evaluate "GradeOpt," a unified multi-agent automatic short-answer grading (ASAG) framework that leverages large language models (LLMs) as graders for short-answer responses. More importantly, "GradeOpt" incorporates two additional LLM-based agents--the "reflector" and the "refiner"--into the multi-agent system. This enables "GradeOpt" to automatically optimize the original grading guidelines by performing self-reflection on its errors. To assess "GradeOpt"'s effectiveness, we conducted experiments on two representative ASAG datasets, which include items designed to capture key aspects of teachers' pedagogical knowledge and students' learning progress. Our results demonstrate that "GradeOpt" consistently outperforms representative baselines in both grading accuracy and alignment with human evaluators across different knowledge domains. Finally, comprehensive ablation studies validate the contributions of "GradeOpt"'s individual components, confirming their impact on overall performance. [For the complete proceedings, see ED675583.]
Descriptors: Grading, Automation, Artificial Intelligence, Natural Language Processing, Educational Technology, Technology Uses in Education, Verbal Tests, Cues, Guidelines, Reflection, Error Correction
International Educational Data Mining Society. e-mail: admin@educationaldatamining.org; Web site: https://educationaldatamining.org/conferences/
Publication Type: Speeches/Meeting Papers; Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: National Science Foundation (NSF)
Authoring Institution: N/A
Grant or Contract Numbers: 1813760; 2405483; 2200757; 2234015
Author Affiliations: N/A

Peer reviewed
