Publication Date
In 2025 | 6 |
Since 2024 | 15 |
Descriptor
Evaluators | 15 |
Scoring | 15 |
Artificial Intelligence | 9 |
Computer Software | 8 |
Comparative Analysis | 6 |
Computational Linguistics | 6 |
Accuracy | 5 |
Computer Assisted Testing | 5 |
Foreign Countries | 5 |
Correlation | 4 |
Essays | 4 |
More ▼ |
Source
Author
Aida Carballo-Fazanes | 1 |
Akif Avcu | 1 |
Alex J. Mechaber | 1 |
Allan S. Cohen | 1 |
Amanda Huee-Ping Wong | 1 |
Anusha Rajajagadeesan | 1 |
Brian E. Clauser | 1 |
Cristina Menescardi | 1 |
Denis Dumas | 1 |
Dineshkumar Ravi | 1 |
Huaibo Wang | 1 |
More ▼ |
Publication Type
Journal Articles | 14 |
Reports - Research | 14 |
Tests/Questionnaires | 2 |
Information Analyses | 1 |
Education Level
Elementary Education | 4 |
Higher Education | 4 |
Postsecondary Education | 4 |
Early Childhood Education | 2 |
Primary Education | 2 |
Secondary Education | 2 |
Elementary Secondary Education | 1 |
Grade 1 | 1 |
Grade 2 | 1 |
Grade 4 | 1 |
Grade 6 | 1 |
More ▼ |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 1 |
Test of English for… | 1 |
Torrance Tests of Creative… | 1 |
What Works Clearinghouse Rating
Akif Avcu – Malaysian Online Journal of Educational Technology, 2025
This scope-review presents the milestones of how Hierarchical Rater Models (HRMs) become operable to used in automated essay scoring (AES) to improve instructional evaluation. Although essay evaluations--a useful instrument for evaluating higher-order cognitive abilities--have always depended on human raters, concerns regarding rater bias,…
Descriptors: Automation, Scoring, Models, Educational Assessment
Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025
Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…
Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy
Ngoc My Bui; Jessie S. Barrot – Education and Information Technologies, 2025
With the generative artificial intelligence (AI) tool's remarkable capabilities in understanding and generating meaningful content, intriguing questions have been raised about its potential as an automated essay scoring (AES) system. One such tool is ChatGPT, which is capable of scoring any written work based on predefined criteria. However,…
Descriptors: Artificial Intelligence, Natural Language Processing, Technology Uses in Education, Automation
Nils Myszkowski; Martin Storme – Journal of Creative Behavior, 2025
In the PISA 2022 creative thinking test, students provide a response to a prompt, which is then coded by human raters as no credit, partial credit, or full credit. Like many large-scale educational testing frameworks, PISA uses the generalized partial credit model (GPCM) as a response model for these ordinal ratings. In this paper, we show that…
Descriptors: Creative Thinking, Creativity Tests, Scores, Prompting
Krishna Mohan Surapaneni; Anusha Rajajagadeesan; Lakshmi Goudhaman; Shalini Lakshmanan; Saranya Sundaramoorthi; Dineshkumar Ravi; Kalaiselvi Rajendiran; Porchelvan Swaminathan – Biochemistry and Molecular Biology Education, 2024
The emergence of ChatGPT as one of the most advanced chatbots and its ability to generate diverse data has given room for numerous discussions worldwide regarding its utility, particularly in advancing medical education and research. This study seeks to assess the performance of ChatGPT in medical biochemistry to evaluate its potential as an…
Descriptors: Biochemistry, Science Instruction, Artificial Intelligence, Teaching Methods
Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024
Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…
Descriptors: Semantics, Educational Assessment, Evaluators, Reliability
Cristina Menescardi; Aida Carballo-Fazanes; NĂºria Ortega-Benavent; Isaac Estevan – Journal of Motor Learning and Development, 2024
The Canadian Agility and Movement Skill Assessment (CAMSA) is a valid and reliable circuit-based test of motor competence which can be used to assess children's skills in a live or recorded performance and then coded. We aimed to analyze the intrarater reliability of the CAMSA scores (total, time, and skill score) and time measured, by comparing…
Descriptors: Interrater Reliability, Evaluators, Scoring, Psychomotor Skills
Makiko Kato – Journal of Education and Learning, 2025
This study aims to examine whether differences exist in the factors influencing the difficulty of scoring English summaries and determining scores based on the raters' attributes, and to collect candid opinions, considerations, and tentative suggestions for future improvements to the analytic rubric of summary writing for English learners. In this…
Descriptors: Writing Evaluation, Scoring, Writing Skills, English (Second Language)
Kevin C. Haudek; Xiaoming Zhai – International Journal of Artificial Intelligence in Education, 2024
Argumentation, a key scientific practice presented in the "Framework for K-12 Science Education," requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging…
Descriptors: Accuracy, Persuasive Discourse, Artificial Intelligence, Learning Management Systems
Yishen Song; Qianta Zhu; Huaibo Wang; Qinhua Zheng – IEEE Transactions on Learning Technologies, 2024
Manually scoring and revising student essays has long been a time-consuming task for educators. With the rise of natural language processing techniques, automated essay scoring (AES) and automated essay revising (AER) have emerged to alleviate this burden. However, current AES and AER models require large amounts of training data and lack…
Descriptors: Scoring, Essays, Writing Evaluation, Computer Software
Reagan Mozer; Luke Miratrix; Jackie Eunjung Relyea; James S. Kim – Journal of Educational and Behavioral Statistics, 2024
In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This…
Descriptors: Scoring, Evaluation Methods, Writing Evaluation, Comparative Analysis
Taichi Yamashita – Language Testing, 2025
With the rapid development of generative artificial intelligence (AI) frameworks (e.g., the generative pre-trained transformer [GPT]), a growing number of researchers have started to explore its potential as an automated essay scoring (AES) system. While previous studies have investigated the alignment between human ratings and GPT ratings, few…
Descriptors: Artificial Intelligence, English (Second Language), Second Language Learning, Second Language Instruction
Selcuk Acar; Denis Dumas; Peter Organisciak; Kelly Berthiaume – Grantee Submission, 2024
Creativity is highly valued in both education and the workforce, but assessing and developing creativity can be difficult without psychometrically robust and affordable tools. The open-ended nature of creativity assessments has made them difficult to score, expensive, often imprecise, and therefore impractical for school- or district-wide use. To…
Descriptors: Thinking Skills, Elementary School Students, Artificial Intelligence, Measurement Techniques
Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024
The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…
Descriptors: Accuracy, Reliability, Computational Linguistics, Standards
Yuko Hayashi; Yusuke Kondo; Yutaka Ishii – Innovation in Language Learning and Teaching, 2024
Purpose: This study builds a new system for automatically assessing learners' speech elicited from an oral discourse completion task (DCT), and evaluates the prediction capability of the system with a view to better understanding factors deemed influential in predicting speaking proficiency scores and the pedagogical implications of the system.…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Japanese