ERIC - Search Results

Publication Date

In 2025	5
Since 2024	9
Since 2021 (last 5 years)	31

Descriptor

Evaluators	31
Computer Assisted Testing	22
Comparative Analysis	14
Language Tests	14
Scoring	13
Computer Software	12
Second Language Learning	12
Foreign Countries	11
English (Second Language)	10
Scores	10
Second Language Instruction	10
Language Proficiency	9
Accuracy	8
Artificial Intelligence	8
Computational Linguistics	8
Speech Communication	6
Correlation	5
Essays	5
Evaluation Methods	5
Oral Language	5
Teacher Attitudes	5
Testing	5
Writing Evaluation	5
College Students	4
Interrater Reliability	4
More ▼

Publication Type

Journal Articles	25
Reports - Research	24
Dissertations/Theses -…	3
Reports - Descriptive	2
Speeches/Meeting Papers	2
Tests/Questionnaires	2
Information Analyses	1
Reports - Evaluative	1

Education Level

Higher Education	11
Postsecondary Education	11
Secondary Education	5
Elementary Education	3
Grade 4	1
Grade 8	1
High Schools	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1

Audience

Location

China	3
Europe	1
Germany	1
Indonesia	1
Iran	1
Japan	1
Singapore	1
Turkey	1
United Kingdom	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

ACTFL Oral Proficiency…	2
International English…	2
Test of English as a Foreign…	2
Foreign Language Classroom…	1
National Assessment of…	1
Test of English for…	1
Torrance Tests of Creative…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 31 results Save | Export

Assessing Penmanship of Chinese Handwriting: A Deep Learning-Based Approach

Peer reviewed

Direct link

Zebo Xu; Prerit S. Mittal; Mohd. Mohsin Ahmed; Chandranath Adak; Zhenguang G. Cai – Reading and Writing: An Interdisciplinary Journal, 2025

The rise of the digital era has led to a decline in handwriting as the primary mode of communication, resulting in negative effects on handwriting literacy, particularly in complex writing systems such as Chinese. The marginalization of handwriting has contributed to the deterioration of penmanship, defined as the ability to write aesthetically…

Descriptors: Handwriting, Writing Skills, Chinese, Ideography

The Vulnerability of AI-Based Scoring Systems to Gaming Strategies: A Case Study

Peer reviewed

Direct link

Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025

Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…

Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy

Integrative Evaluation

Peer reviewed

Direct link

Feinstein, Osvaldo – American Journal of Evaluation, 2023

"Integrative evaluation" is an approach with two main phases: identification of plausible rival hypotheses and integration of rival hypotheses. The first phase may correspond to traditional adversary evaluation, whereas the second phase, that is not included in adversary evaluation, requires integrative thinking which can be applied when…

Descriptors: Evaluation, Integrated Activities, Intervention, Evaluators

Administration, Labor, and Love

Peer reviewed

Direct link

Ginther, April – Language Testing, 2023

Great opportunities for language testing practitioners are enabled through language program administration. Local language tests lend themselves to multiple purposes--for placement and diagnosis, as a means of tracking progress, and as a contribution to program evaluation and revision. Administrative choices, especially those involving a test, are…

Descriptors: Language Tests, Testing, Examiners, Placement Tests

Teacher Perceptions of Psychological Reports: An Empirical Comparison of District Evaluators' and Contracted Evaluators' Report Styles

Direct link

Peter Stern – ProQuest LLC, 2021

Across the country, school districts are increasingly seeking out privately contracted psychologists to conduct psychological evaluations. As such, it is increasingly important that psychological reports adhere to best practices and are written to ensure comprehension by both parents and teachers. This study explored the potential differences…

Descriptors: Teachers, Special Education Teachers, Teacher Attitudes, Psychological Evaluation

Using Linkage Sets to Improve Connectedness in Rater Response Model Estimation

Peer reviewed

Direct link

Casabianca, Jodi M.; Donoghue, John R.; Shin, Hyo Jeong; Chao, Szu-Fu; Choi, Ikkyu – Journal of Educational Measurement, 2023

Using item-response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular rating designs used in large-scale testing scenarios,…

Descriptors: Item Response Theory, Alternative Assessment, Evaluators, Research Problems

Evaluating Quadratic Weighted Kappa as the Standard Performance Metric for Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Doewes, Afrizal; Kurdhi, Nughthoh Arfawi; Saxena, Akrati – International Educational Data Mining Society, 2023

Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of…

Descriptors: Essays, Writing Evaluation, Evaluators, Accuracy

Automated Essay Scoring and Revising Based on Open-Source Large Language Models

Peer reviewed

Direct link

Yishen Song; Qianta Zhu; Huaibo Wang; Qinhua Zheng – IEEE Transactions on Learning Technologies, 2024

Manually scoring and revising student essays has long been a time-consuming task for educators. With the rise of natural language processing techniques, automated essay scoring (AES) and automated essay revising (AER) have emerged to alleviate this burden. However, current AES and AER models require large amounts of training data and lack…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

Modeling and Analyzing Scorer Preferences in Short-Answer Math Questions

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mengxue; Heffernan, Neil; Lan, Andrew – International Educational Data Mining Society, 2023

Automated scoring of student responses to open-ended questions, including short-answer questions, has great potential to scale to a large number of responses. Recent approaches for automated scoring rely on supervised learning, i.e., training classifiers or fine-tuning language models on a small number of responses with human-provided score…

Descriptors: Scoring, Computer Assisted Testing, Mathematics Instruction, Mathematics Tests

Application of an Automated Essay Scoring Engine to English Writing Assessment Using Many-Facet Rasch Measurement

Peer reviewed

Direct link

Chan, Kinnie Kin Yee; Bond, Trevor; Yan, Zi – Language Testing, 2023

We investigated the relationship between the scores assigned by an Automated Essay Scoring (AES) system, the Intelligent Essay Assessor (IEA), and grades allocated by trained, professional human raters to English essay writing by instigating two procedures novel to written-language assessment: the logistic transformation of AES raw scores into…

Descriptors: Computer Assisted Testing, Essays, Scoring, Scores

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Artificial Intelligence as an Automated Essay Scoring Tool: A Focus on ChatGPT

Peer reviewed
PDF on ERIC

Download full text

Ahmet Can Uyar; Dilek Büyükahiska – International Journal of Assessment Tools in Education, 2025

This study explores the effectiveness of using ChatGPT, an Artificial Intelligence (AI) language model, as an Automated Essay Scoring (AES) tool for grading English as a Foreign Language (EFL) learners' essays. The corpus consists of 50 essays representing various types including analysis, compare and contrast, descriptive, narrative, and opinion…

Descriptors: Artificial Intelligence, Computer Software, Technology Uses in Education, Teaching Methods

Measuring Original Thinking in Elementary School: Development and Validation of a Computational Psychometric Approach

Peer reviewed

Direct link

Selcuk Acar; Denis Dumas; Peter Organisciak; Kelly Berthiaume – Grantee Submission, 2024

Creativity is highly valued in both education and the workforce, but assessing and developing creativity can be difficult without psychometrically robust and affordable tools. The open-ended nature of creativity assessments has made them difficult to score, expensive, often imprecise, and therefore impractical for school- or district-wide use. To…

Descriptors: Thinking Skills, Elementary School Students, Artificial Intelligence, Measurement Techniques

Mitigating Gender and L1 Biases in Automated English Speaking Assessment

Direct link

Alexander James Kwako – ProQuest LLC, 2023

Automated assessment using Natural Language Processing (NLP) has the potential to make English speaking assessments more reliable, authentic, and accessible. Yet without careful examination, NLP may exacerbate social prejudices based on gender or native language (L1). Current NLP-based assessments are prone to such biases, yet research and…

Descriptors: Gender Bias, Natural Language Processing, Native Language, Computational Linguistics

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

Previous Page | Next Page »

Pages: 1 | 2 | 3

Language Testing	5
ProQuest LLC	3
Assessment in Education:…	2
International Educational…	2
Journal of Educational…	2
Advances in Physiology…	1
American Journal of Evaluation	1
Applied Linguistics	1
English Teaching	1
European Journal of…	1
European Journal of Open,…	1
Grantee Submission	1
IEEE Transactions on Learning…	1
Innovation in Language…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
Journal of Educational Data…	1
Journal of Speech, Language,…	1
Language Assessment Quarterly	1
Reading and Writing: An…	1
More ▼

Ahmet Can Uyar	1
Alex J. Mechaber	1
Alexander James Kwako	1
Amanda Huee-Ping Wong	1
Amrane-Cooper, Linda	1
Apple, Kristen	1
Bond, Trevor	1
Bosch, Nigel	1
Brian E. Clauser	1
Brown, Alan V.	1
Cahyono, Sulistio Mukti	1
Casabianca, Jodi M.	1
Chan, Kinnie Kin Yee	1
Chandranath Adak	1
Chao, Szu-Fu	1
Choi, Ikkyu	1
Chukharev-Hudilainen, Evgeny	1
Cox, Troy L.	1
Dalton, Sarah Grace	1
Davis, Larry	1
Denis Dumas	1
Dilek Büyükahiska	1
Doewes, Afrizal	1
Donoghue, John R.	1
Eckes, Thomas	1
More ▼