Artificial Intelligence in International English Language Testing System Writing Assessments: A Comparative Study of Human Ratings and DeepAI.

Somayeh Fathali; Fatemeh Mohajeri

Notes FAQ Contact Us

Back to results

Peer reviewed
PDF on ERIC

Download full text

ERIC Number: EJ1489503

Record Type: Journal

Publication Date: 2025

Pages: 18

Abstractor: As Provided

ISBN: N/A

ISSN: N/A

EISSN: EISSN-2652-1687

Available Date: 0000-00-00

Artificial Intelligence in International English Language Testing System Writing Assessments: A Comparative Study of Human Ratings and DeepAI

Somayeh Fathali; Fatemeh Mohajeri

Technology in Language Teaching & Learning, v7 n4 Article 103131 2025

The International English Language Testing System (IELTS) is a high-stakes exam where Writing Task 2 significantly influences the overall scores, requiring reliable evaluation. While trained human raters perform this task, concerns about subjectivity and inconsistency have led to growing interest in artificial intelligence (AI)-based assessment tools. However, little empirical evidence exists on AI in high-stakes testing, and no study has examined DeepAI in this context. Accordingly, using a repeatedmeasures design, this study investigated the comparability and reliability of human and DeepAI ratings of 145 IELTS Writing Task 2 essays collected from the official IELTS Tehran Test Centre. These essays had been previously scored by certified human examiners and were subsequently rescored by DeepAI using a rubric-based prompt based on IELTS standards. Statistical analyses, including paired sample t-tests and multivariate analysis of variance, were conducted to explore rater differences and scoring alignment. The results revealed no significant differences in the overall band scores between the human and AI assessments; however, minor differences were observed in some specific criteria. Additionally, DeepAI showed strong intra-rater reliability, producing consistent scores over a two-week interval. These findings suggest that DeepAI may serve as a reliable supplementary tool in high-stakes writing assessments. However, full replacement of human judgment remains premature, and a combination of human judgment and AI support may be the most effective approach.

Descriptors: English (Second Language), Language Tests, Second Language Learning, Artificial Intelligence, Writing Tests, High Stakes Tests, Foreign Countries, Scoring, Interrater Reliability, Automation

Castledown Publishers. Ground Level, 470 St Kilda Road, Melbourne, 3004, Australia. Tel: +61-3-7003-8355; e-mail: contact@castledown.com; Web site: https://www.castledown.com/journals/tltl

Publication Type: Journal Articles; Reports - Research

Education Level: N/A

Audience: N/A

Language: English

Sponsor: N/A

Authoring Institution: N/A

Identifiers - Location: Iran (Tehran)

Identifiers - Assessments and Surveys: International English Language Testing System

Grant or Contract Numbers: N/A

Author Affiliations: N/A