NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: EJ1460469
Record Type: Journal
Publication Date: 2025-Mar
Pages: 12
Abstractor: As Provided
ISBN: N/A
ISSN: ISSN-0731-1745
EISSN: EISSN-1745-3992
Available Date: 2024-12-19
Instruction-Tuned Large-Language Models for Quality Control in Automatic Item Generation: A Feasibility Study
Educational Measurement: Issues and Practice, v44 n1 p96-107 2025
Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for evaluating automatically generated cloze items. The trained large-language model was able to filter out majority of good and bad items accurately. Evaluating items automatically with instruction-tuned LLMs may aid educators and test developers in understanding the quality of items created in an efficient and scalable manner. The item evaluation process with LLMs may also act as an intermediate step between item creation and field testing to reduce the cost and time associated with multiple rounds of revision.
Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us
Publication Type: Journal Articles; Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A
Author Affiliations: 1University of Alberta, Leibniz Institute for Science and Mathematics Education, Europa-Universität Flensburg; 2University of Alberta