NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: ED674426
Record Type: Non-Journal
Publication Date: 2025-Jul-15
Pages: 15
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: 0000-00-00
Can Large Language Models Match Tutoring System Adaptivity? A Benchmarking Study
Grantee Submission, Paper presented at the International Conference on Artificial Intelligence in Education (26th, Palermo, Italy, Jul 22-26, 2025)
Large Language Models (LLMs) hold promise as dynamic instructional aids. Yet, it remains unclear whether LLMs can replicate the adaptivity of intelligent tutoring systems (ITS)--where student knowledge and pedagogical strategies are explicitly modeled. We propose a prompt variation framework to assess LLM-generated instructional moves' adaptivity and pedagogical soundness across 75 real-world tutoring scenarios from an ITS. We systematically remove key context components (e.g., student errors and knowledge components) from prompts to create variations of each scenario. Three representative LLMs (Llama3-8B, Llama3-70B, and GPT-4o) generate 1,350 instructional moves. We use text embeddings and randomization tests to measure how the omission of each context feature impacts the LLMs' outputs (adaptivity) and a validated tutor-training classifier to evaluate response quality (pedagogical soundness). Surprisingly, even the best-performing model only marginally mimics the adaptivity of ITS. Specifically, Llama3-70B demonstrates statistically significant adaptivity to student errors. Although Llama3-8B's recommendations receive higher pedagogical soundness scores than the other models, it struggles with instruction-following behaviors, including output formatting. By contrast, GPT-4o reliably adheres to instructions but tends to provide overly direct feedback that diverges from effective tutoring, prompting learners with open-ended questions to gauge knowledge. Given these results, we discuss how current LLM-based tutoring is unlikely to produce learning benefits rivaling known-to-be-effective ITS tutoring. Through our open-source benchmarking code, we contribute a reproducible method for evaluating LLMs' instructional adaptivity and fidelity.
Publication Type: Speeches/Meeting Papers; Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: Institute of Education Sciences (ED)
Authoring Institution: N/A
IES Funded: Yes
Grant or Contract Numbers: R305A220386
Department of Education Funded: Yes