Publication Date
In 2025 | 3 |
Since 2024 | 11 |
Since 2021 (last 5 years) | 25 |
Since 2016 (last 10 years) | 32 |
Since 2006 (last 20 years) | 36 |
Descriptor
Comparative Analysis | 37 |
Computer Software | 37 |
Evaluators | 37 |
Artificial Intelligence | 17 |
Computational Linguistics | 14 |
Scoring | 13 |
Accuracy | 12 |
English (Second Language) | 12 |
Evaluation Methods | 12 |
Foreign Countries | 12 |
Second Language Learning | 12 |
More ▼ |
Source
Author
Abdalla, Mohamed | 1 |
Aggarwal, Varun | 1 |
Ahmet Can Uyar | 1 |
Akbari, Alireza | 1 |
Amanda Huee-Ping Wong | 1 |
Armijo-Olivo, Susan | 1 |
Azza Warraitch | 1 |
Bosch, Nigel | 1 |
Brannen, Kathleen | 1 |
Breyer, F. Jay | 1 |
Campbell, Sandy | 1 |
More ▼ |
Publication Type
Reports - Research | 33 |
Journal Articles | 31 |
Speeches/Meeting Papers | 4 |
Tests/Questionnaires | 3 |
Reports - Evaluative | 2 |
Collected Works - Proceedings | 1 |
Dissertations/Theses -… | 1 |
Reports - Descriptive | 1 |
Education Level
Higher Education | 10 |
Postsecondary Education | 10 |
Secondary Education | 4 |
Elementary Education | 3 |
Early Childhood Education | 2 |
Elementary Secondary Education | 2 |
Primary Education | 2 |
Grade 1 | 1 |
Grade 2 | 1 |
Grade 4 | 1 |
Grade 8 | 1 |
More ▼ |
Audience
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Yuan Tian; Xi Yang; Suhail A. Doi; Luis Furuya-Kanamori; Lifeng Lin; Joey S. W. Kwong; Chang Xu – Research Synthesis Methods, 2024
RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two…
Descriptors: Risk, Randomized Controlled Trials, Classification, Robotics
Qusai Khraisha; Sophie Put; Johanna Kappenberg; Azza Warraitch; Kristin Hadfield – Research Synthesis Methods, 2024
Systematic reviews are vital for guiding practice, research and policy, although they are often slow and labour-intensive. Large language models (LLMs) could speed up and automate systematic reviews, but their performance in such tasks has yet to be comprehensively evaluated against humans, and no study has tested Generative Pre-Trained…
Descriptors: Peer Evaluation, Research Reports, Artificial Intelligence, Computer Software
Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025
As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…
Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy
Kevin C. Haudek; Xiaoming Zhai – International Journal of Artificial Intelligence in Education, 2024
Argumentation, a key scientific practice presented in the "Framework for K-12 Science Education," requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging…
Descriptors: Accuracy, Persuasive Discourse, Artificial Intelligence, Learning Management Systems
Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025
This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics
Yishen Song; Qianta Zhu; Huaibo Wang; Qinhua Zheng – IEEE Transactions on Learning Technologies, 2024
Manually scoring and revising student essays has long been a time-consuming task for educators. With the rise of natural language processing techniques, automated essay scoring (AES) and automated essay revising (AER) have emerged to alleviate this burden. However, current AES and AER models require large amounts of training data and lack…
Descriptors: Scoring, Essays, Writing Evaluation, Computer Software
Zhang, Mengxue; Heffernan, Neil; Lan, Andrew – International Educational Data Mining Society, 2023
Automated scoring of student responses to open-ended questions, including short-answer questions, has great potential to scale to a large number of responses. Recent approaches for automated scoring rely on supervised learning, i.e., training classifiers or fine-tuning language models on a small number of responses with human-provided score…
Descriptors: Scoring, Computer Assisted Testing, Mathematics Instruction, Mathematics Tests
Sanosi, Abdulaziz; Abdalla, Mohamed – Australian Journal of Applied Linguistics, 2021
This study aimed to examine the potentials of the NLP approach in detecting discourse markers (DMs), namely okay, in transcribed spoken data. One hundred thirty-eight concordance lines were presented to human referees to judge the functions of okay in them as a DM or Non-DM. After that, the researchers used a Python script written according to the…
Descriptors: Natural Language Processing, Computational Linguistics, Programming Languages, Accuracy
Reagan Mozer; Luke Miratrix; Jackie Eunjung Relyea; James S. Kim – Journal of Educational and Behavioral Statistics, 2024
In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This…
Descriptors: Scoring, Evaluation Methods, Writing Evaluation, Comparative Analysis
Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021
Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…
Descriptors: Scoring, Essays, Writing Evaluation, Computer Software
Leech, Tony; Chambers, Lucy – Research Matters, 2022
Two of the central issues in comparative judgement (CJ), which are perhaps underexplored compared to questions of the method's reliability and technical quality, are "what processes do judges use to make their decisions" and "what features do they focus on when making their decisions?" This article discusses both, in the…
Descriptors: Comparative Analysis, Decision Making, Evaluators, Reliability
Osama Koraishi – Language Teaching Research Quarterly, 2024
This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…
Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence
Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022
The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…
Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability
Jia, Qinjin; Young, Mitchell; Xiao, Yunkai; Cui, Jialin; Liu, Chengyuan; Rashid, Parvez; Gehringer, Edward – International Educational Data Mining Society, 2022
Providing timely feedback is crucial in promoting academic achievement and student success. However, for multifarious reasons (e.g., limited teaching resources), feedback often arrives too late for learners to act on the feedback and improve learning. Thus, automated feedback systems have emerged to tackle educational tasks in various domains,…
Descriptors: Student Projects, Feedback (Response), Natural Language Processing, Guidelines
Ahmet Can Uyar; Dilek Büyükahiska – International Journal of Assessment Tools in Education, 2025
This study explores the effectiveness of using ChatGPT, an Artificial Intelligence (AI) language model, as an Automated Essay Scoring (AES) tool for grading English as a Foreign Language (EFL) learners' essays. The corpus consists of 50 essays representing various types including analysis, compare and contrast, descriptive, narrative, and opinion…
Descriptors: Artificial Intelligence, Computer Software, Technology Uses in Education, Teaching Methods