ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	12
Since 2006 (last 20 years)	23

Descriptor

Computer Assisted Testing	25
Reliability	25
Scoring	25
Validity	14
Correlation	10
Essays	10
Comparative Analysis	8
Automation	7
Writing Evaluation	7
Computer Software	6
Foreign Countries	6
Student Evaluation	6
Accuracy	5
Evaluation Methods	5
Evaluators	5
Standardized Tests	5
Writing Tests	5
English (Second Language)	4
Essay Tests	4
Scores	4
Scoring Rubrics	4
Second Language Learning	4
Task Analysis	4
Accountability	3
Artificial Intelligence	3
More ▼

Publication Type

Journal Articles	18
Reports - Research	14
Reports - Descriptive	5
Reports - Evaluative	3
Speeches/Meeting Papers	2
Collected Works - General	1
Dissertations/Theses -…	1
Information Analyses	1
Reports - General	1

Education Level

Higher Education	12
Postsecondary Education	11
Elementary Secondary Education	4
Elementary Education	3
High Schools	2
Secondary Education	2
Early Childhood Education	1
Grade 3	1
Grade 4	1
Grade 5	1
Intermediate Grades	1
Middle Schools	1
Primary Education	1
More ▼

Audience

Policymakers

Location

Australia	3
Canada	2
Connecticut	2
New Hampshire	2
New York	2
Rhode Island	2
Singapore	2
United Kingdom (England)	2
Vermont	2
Austria	1
Belgium	1
Chile	1
China	1
Cyprus	1
Czech Republic	1
Denmark	1
Estonia	1
France	1
Germany	1
Ireland	1
Italy	1
Japan	1
Netherlands	1
North Carolina (Greensboro)	1
Norway	1
More ▼

Laws, Policies, & Programs

Every Student Succeeds Act…	2
Elementary and Secondary…	1
No Child Left Behind Act 2001	1

Assessments and Surveys

National Assessment of…	3
New York State Regents…	2
Test of English as a Foreign…	2
Dynamic Indicators of Basic…	1
Graduate Record Examinations	1
Minnesota Multiphasic…	1
United States Medical…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 25 results Save | Export

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

On the Limitations of Human-Computer Agreement in Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Doewes, Afrizal; Pechenizkiy, Mykola – International Educational Data Mining Society, 2021

Scoring essays is generally an exhausting and time-consuming task for teachers. Automated Essay Scoring (AES) facilitates the scoring process to be faster and more consistent. The most logical way to assess the performance of an automated scorer is by measuring the score agreement with the human raters. However, we provide empirical evidence that…

Descriptors: Man Machine Systems, Automation, Computer Assisted Testing, Scoring

Examining Human and Automated Ratings of Elementary Students' Writing Quality: A Multivariate Generalizability Theory Application

Peer reviewed

Direct link

Chen, Dandan; Hebert, Michael; Wilson, Joshua – American Educational Research Journal, 2022

We used multivariate generalizability theory to examine the reliability of hand-scoring and automated essay scoring (AES) and to identify how these scoring methods could be used in conjunction to optimize writing assessment. Students (n = 113) included subsamples of struggling writers and non-struggling writers in Grades 3-5 drawn from a larger…

Descriptors: Reliability, Scoring, Essays, Automation

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

Semantic Distance and the Alternate Uses Task: Recommendations for Reliable Automated Assessment of Originality

Peer reviewed

Direct link

Beaty, Roger E.; Johnson, Dan R.; Zeitlen, Daniel C.; Forthmann, Boris – Creativity Research Journal, 2022

Semantic distance is increasingly used for automated scoring of originality on divergent thinking tasks, such as the Alternate Uses Task (AUT). Despite some psychometric support for semantic distance -- including positive correlations with human creativity ratings -- additional work is needed to optimize its reliability and validity, including…

Descriptors: Semantics, Scoring, Creative Thinking, Creativity

Validation of an Automated Procedure for Calculating Core Lexicon from Transcripts

Peer reviewed

Direct link

Dalton, Sarah Grace; Stark, Brielle C.; Fromm, Davida; Apple, Kristen; MacWhinney, Brian; Rensch, Amanda; Rowedder, Madyson – Journal of Speech, Language, and Hearing Research, 2022

Purpose: The aim of this study was to advance the use of structured, monologic discourse analysis by validating an automated scoring procedure for core lexicon (CoreLex) using transcripts. Method: Forty-nine transcripts from persons with aphasia and 48 transcripts from persons with no brain injury were retrieved from the AphasiaBank database. Five…

Descriptors: Validity, Discourse Analysis, Databases, Scoring

The Influence of Rater Effects in Training Sets on the Psychometric Quality of Automated Scoring for Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018

Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…

Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring

Automated L2 Writing Performance Assessment: A Literature Review

Peer reviewed

Direct link

Sari, Elif; Han, Turgay – Reading Matrix: An International Online Journal, 2021

Providing both effective feedback applications and reliable assessment practices are two central issues in ESL/EFL writing instruction contexts. Giving individual feedback is very difficult in crowded classes as it requires a great amount of time and effort for instructors. Moreover, instructors likely employ inconsistent assessment procedures,…

Descriptors: Automation, Writing Evaluation, Artificial Intelligence, Natural Language Processing

Characterizing Students' Ideas about the Effects of a Mutation in a Noncoding Region of DNA

Peer reviewed

Direct link

Sieke, Scott A.; McIntosh, Betsy B.; Steele, Matthew M.; Knight, Jennifer K. – CBE - Life Sciences Education, 2019

Understanding student ideas in large-enrollment biology courses can be challenging, because easy-to-administer multiple-choice questions frequently do not fully capture the diversity of student ideas. As part of the Automated Analysis of Constructed Responses (AACR) project, we designed a question prompting students to describe the possible…

Descriptors: Genetics, Scientific Concepts, Biology, Science Instruction

Internet Administration of the Paper-and-Pencil Gifted Rating Scale: Assessing Psychometric Equivalence

Peer reviewed

Direct link

Yarnell, Jordy B.; Pfeiffer, Steven I. – Journal of Psychoeducational Assessment, 2015

The present study examined the psychometric equivalence of administering a computer-based version of the Gifted Rating Scale (GRS) compared with the traditional paper-and-pencil GRS-School Form (GRS-S). The GRS-S is a teacher-completed rating scale used in gifted assessment. The GRS-Electronic Form provides an alternative method of administering…

Descriptors: Gifted, Psychometrics, Rating Scales, Computer Assisted Testing

Investigating the Application of Automated Writing Evaluation to Chinese Undergraduate English Majors: A Case Study of "WriteToLearn"

Peer reviewed
PDF on ERIC

Download full text

Liu, Sha; Kunnan, Antony John – CALICO Journal, 2016

This study investigated the application of "WriteToLearn" on Chinese undergraduate English majors' essays in terms of its scoring ability and the accuracy of its error feedback. Participants were 163 second-year English majors from a university located in Sichuan province who wrote 326 essays from two writing prompts. Each paper was…

Descriptors: Foreign Countries, Undergraduate Students, English (Second Language), Second Language Learning

Examining Increased Flexibility in Assessment Formats

Peer reviewed

Direct link

Irwin, Brian; Hepplestone, Stuart – Assessment & Evaluation in Higher Education, 2012

There have been calls in the literature for changes to assessment practices in higher education, to increase flexibility and give learners more control over the assessment process. This article explores the possibilities of allowing student choice in the format used to present their work, as a starting point for changing assessment, based on…

Descriptors: Student Evaluation, College Students, Selection, Computer Assisted Testing

Comparison of Automated Scoring Methods for a Computerized Performance Assessment of Clinical Judgment

Peer reviewed

Direct link

Harik, Polina; Baldwin, Peter; Clauser, Brian – Applied Psychological Measurement, 2013

Growing reliance on complex constructed response items has generated considerable interest in automated scoring solutions. Many of these solutions are described in the literature; however, relatively few studies have been published that "compare" automated scoring strategies. Here, comparisons are made among five strategies for…

Descriptors: Computer Assisted Testing, Automation, Scoring, Comparative Analysis

Developing and Measuring Higher Order Skills: Models for State Performance Assessment Systems. Research Brief

Peer reviewed
PDF on ERIC

Download full text

Darling-Hammond, Linda – Learning Policy Institute, 2017

After passage of the Every Student Succeeds Act (ESSA) in 2015, states assumed greater responsibility for designing their own accountability and assessment systems. ESSA requires states to measure "higher order thinking skills and understanding" and encourages the use of open-ended performance assessments, which are essential for…

Descriptors: Performance Based Assessment, Accountability, Portfolios (Background Materials), Task Analysis

Automated Trait Scores for "GRE"® Writing Tasks. Research Report. ETS RR-15-15

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal; Sinharay, Sandip – ETS Research Report Series, 2015

The "e-rater"® automated essay scoring system is used operationally in the scoring of the argument and issue tasks that form the Analytical Writing measure of the "GRE"® General Test. For each of these tasks, this study explored the value added of reporting 4 trait scores for each of these 2 tasks over the total e-rater score.…

Descriptors: Scores, Computer Assisted Testing, Computer Software, Grammar

Previous Page | Next Page »

Pages: 1 | 2

Journal of Technology,…	2
Advances in Physiology…	1
American Educational Research…	1
Applied Linguistics	1
Applied Psychological…	1
Assessment & Evaluation in…	1
British Educational Research…	1
CALICO Journal	1
CBE - Life Sciences Education	1
Center for American Progress	1
Council of Chief State School…	1
Creativity Research Journal	1
ETS Research Report Series	1
Higher Education Quarterly	1
International Educational…	1
International Journal of…	1
Journal of Clinical Psychology	1
Journal of Psychoeducational…	1
Journal of Speech, Language,…	1
Learning Policy Institute	1
OECD Publishing	1
ProQuest LLC	1
Reading Matrix: An…	1
More ▼

Attali, Yigal	2
Darling-Hammond, Linda	2
Amanda Huee-Ping Wong	1
Apple, Kristen	1
Baldwin, Peter	1
Beaty, Roger E.	1
Brown, Gavin T. L.	1
Burstein, Jill	1
Chen, Dandan	1
Clauser, Brian	1
Dalton, Sarah Grace	1
Davis, Lawrence Edward	1
Dikli, Semire	1
Doewes, Afrizal	1
Engelhard, George, Jr.	1
Foltz, Peter	1
Forthmann, Boris	1
Fromm, Davida	1
Gentile, Claudia	1
Han, Turgay	1
Harik, Polina	1
Hebert, Michael	1
Hepplestone, Stuart	1
Irwin, Brian	1
More ▼