ERIC - Search Results

Publication Date

In 2026	0
Since 2025	4
Since 2022 (last 5 years)	5
Since 2017 (last 10 years)	7
Since 2007 (last 20 years)	19

Descriptor

Comparative Testing	19
Interrater Reliability	19
Foreign Countries	7
Computer Assisted Testing	6
Evaluation Methods	6
Test Reliability	6
Test Validity	6
Undergraduate Students	5
Grading	4
Student Evaluation	4
Artificial Intelligence	3
Evaluation Criteria	3
Feedback (Response)	3
Intermode Differences	3
Item Analysis	3
Scoring	3
Scoring Rubrics	3
Writing Evaluation	3
College Students	2
Comparative Analysis	2
Educational Practices	2
Error of Measurement	2
Item Response Theory	2
Measures (Individuals)	2
Peer Evaluation	2
More ▼

Source

Journal of Technology,…	2
Advances in Physiology…	1
American Journal of Business…	1
Assessing Writing	1
British Educational Research…	1
Creativity Research Journal	1
Early Child Development and…	1
European Journal of…	1
International Journal of…	1
Journal of Computer Assisted…	1
Journal of Educational…	1
Journal of Experimental…	1
Journal of Pan-Pacific…	1
Marketing Education Review	1
Physical Review Special…	1
ProQuest LLC	1
Studies in Higher Education	1
West Virginia Department of…	1
More ▼

Publication Type

Journal Articles	17
Reports - Research	13
Reports - Evaluative	5
Dissertations/Theses -…	1
Numerical/Quantitative Data	1

Education Level

Higher Education	13
Postsecondary Education	7
Early Childhood Education	1
Elementary Secondary Education	1
High Schools	1
Middle Schools	1

Audience

Location

Australia	2
Canada	1
China	1
Israel	1
South Korea	1
Texas	1
United Kingdom (Leeds)	1
West Virginia	1

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

AI-Assisted Assessment of Inquiry Skills in Socioscientific Issue Contexts

Peer reviewed

Direct link

Wen Xin Zhang; John J. H. Lin; Ying-Shao Hsu – Journal of Computer Assisted Learning, 2025

Background Study: Assessing learners' inquiry-based skills is challenging as social, political, and technological dimensions must be considered. The advanced development of artificial intelligence (AI) makes it possible to address these challenges and shape the next generation of science education. Objectives: The present study evaluated the SSI…

Descriptors: Artificial Intelligence, Computer Assisted Testing, Inquiry, Active Learning

Evidence-Based Evaluation of Student and Marker Performances in Assessment and Examination

Peer reviewed

Direct link

Ole J. Kemi – Advances in Physiology Education, 2025

Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…

Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards

Examining AI-Based Accuracy Assessment in L2 Learners' Writing

Peer reviewed

Direct link

On-Soon Lee – Journal of Pan-Pacific Association of Applied Linguistics, 2024

Despite the increasing interest in using AI tools as assistant agents in instructional settings, the effectiveness of ChatGPT, the generative pretrained AI, for evaluating the accuracy of second language (L2) writing has been largely unexplored in formative assessment. Therefore, the current study aims to examine how ChatGPT, as an evaluator,…

Descriptors: Foreign Countries, Undergraduate Students, English (Second Language), Second Language Learning

Teacher Perceptions of Psychological Reports: An Empirical Comparison of District Evaluators' and Contracted Evaluators' Report Styles

Direct link

Peter Stern – ProQuest LLC, 2021

Across the country, school districts are increasingly seeking out privately contracted psychologists to conduct psychological evaluations. As such, it is increasingly important that psychological reports adhere to best practices and are written to ensure comprehension by both parents and teachers. This study explored the potential differences…

Descriptors: Teachers, Special Education Teachers, Teacher Attitudes, Psychological Evaluation

Putting Raters in Ratees' Shoes: Perspective Taking and Assessment of Creative Products

Peer reviewed

Direct link

Han, Jiantao; Long, Haiying; Pang, Weiguo – Creativity Research Journal, 2017

This study reported 2 experiments that studied the effect of perspective taking on assessment of creative products by using human raters. Forty responses of 2 alternative uses tasks (AUTs) and 15 alien stories generated by 6th-grade students were used as assessment materials. Undergraduate students as the novice raters assessed the products under…

Descriptors: Perspective Taking, Creativity, Undergraduate Students, Psychology

Comparison of Integrated Testlet and Constructed-Response Question Formats

Peer reviewed

Direct link

Slepkov, Aaron D.; Shiell, Ralph C. – Physical Review Special Topics - Physics Education Research, 2014

Constructed-response (CR) questions are a mainstay of introductory physics textbooks and exams. However, because of the time, cost, and scoring reliability constraints associated with this format, CR questions are being increasingly replaced by multiple-choice (MC) questions in formal exams. The integrated testlet (IT) is a recently developed…

Descriptors: Science Tests, Physics, Responses, Multiple Choice Tests

Findings from the 2012 West Virginia Online Writing Scoring Comparability Study

Download full text

Hixson, Nate; Rhudy, Vaughn – West Virginia Department of Education, 2013

Student responses to the West Virginia Educational Standards Test (WESTEST) 2 Online Writing Assessment are scored by a computer-scoring engine. The scoring method is not widely understood among educators, and there exists a misperception that it is not comparable to hand scoring. To address these issues, the West Virginia Department of Education…

Descriptors: Scoring Formulas, Scoring Rubrics, Interrater Reliability, Test Scoring Machines

Peer Assessment without Assessment Criteria

Peer reviewed

Direct link

Jones, Ian; Alcock, Lara – Studies in Higher Education, 2014

Peer assessment typically requires students to judge peers' work against assessment criteria. We tested an alternative approach in which students judged pairs of scripts against one another in the absence of assessment criteria. First year mathematics undergraduates (N?=?194) sat a written test on conceptual understanding of multivariable…

Descriptors: Peer Evaluation, Evaluation Criteria, Alternative Assessment, Undergraduate Students

Using Technology to Facilitate Grading Consistency in Large Classes

Peer reviewed

Direct link

Cathcart, Abby; Neale, Larry – Marketing Education Review, 2012

University classes in marketing are often large and therefore require teams of teachers to cover all of the necessary activities. A major problem with teaching teams is the inconsistency that results from myriad individuals offering subjective opinions (Preston 1997). This innovation uses the latest moderation techniques along with Audience…

Descriptors: Marketing, College Instruction, Team Teaching, Class Size

Business Education Innovation: How Common Exams Can Improve University Teaching

Peer reviewed
PDF on ERIC

Download full text

Unger, Darian – American Journal of Business Education, 2010

Although there is significant research on improving college-level teaching practices, most literature in the field assumes an incentive for improvement. The research presented in this paper addresses the issue of poor incentives for improving university-level teaching. Specifically, it proposes instructor-designed common examinations as an…

Descriptors: Educational Innovation, Educational Improvement, Instructional Improvement, Business Administration Education

How Accurate Can Mothers and Teachers Be regarding Children's Emergent Literacy Development? A Comparison between Mothers with High and Low Education

Peer reviewed

Direct link

Korat, Ofra – Early Child Development and Care, 2009

The relationship between mothers' and educators' evaluation of 75 children's emergent literacy levels and actual levels were investigated. Two groups of mothers participated: mothers with a low education and mothers with a high education. The children's emergent literacy was measured. The mothers evaluated their own children and 40 teachers…

Descriptors: Mothers, Emergent Literacy, Interrater Reliability, Mother Attitudes

Keyboarding Compared with Handwriting on a High-Stakes Writing Assessment: Student Choice of Composing Medium, Raters' Perceptions, and Text Quality

Peer reviewed

Direct link

Whithaus, Carl; Harrison, Scott B.; Midyette, Jeb – Assessing Writing, 2008

This article examines the influence of keyboarding versus handwriting in a high-stakes writing assessment. Conclusions are based on data collected from a pilot project to move Old Dominion University's Exit Exam of Writing Proficiency from a handwritten format into a dual-option format (i.e., the students may choose to handwrite or keyboard the…

Descriptors: Writing Evaluation, Handwriting, Pilot Projects, Writing Tests

Assessor Training: Its Effects on Criterion-Based Assessment in a Medical Context

Direct link

Pell, Godfrey; Homer, Matthew S.; Roberts, Trudie E. – International Journal of Research & Method in Education, 2008

Increasingly, academic institutions are being required to improve the validity of the assessment process; unfortunately, often this is at the expense of reliability. In medical schools (such as Leeds), standardized tests of clinical skills, such as "Objective Structured Clinical Examinations" (OSCEs) are widely used to assess clinical…

Descriptors: Medical Education, Standardized Tests, Clinical Experience, Criterion Referenced Tests

Previous Page | Next Page »

Pages: 1 | 2

Alcock, Lara	1
Brown, Michelle Stallone	1
Cater, Arthur	1
Cathcart, Abby	1
Costello, Fintan	1
Devereux, Barry	1
Hamid Mohammadi	1
Han, Jiantao	1
Harrison, Scott B.	1
Hixson, Nate	1
Homer, Matthew S.	1
Huynh, Huynh	1
John J. H. Lin	1
Jonas Flodén	1
Jones, Ian	1
Kim, Do-Hong	1
Korat, Ofra	1
Liow, Jong-Leng	1
Long, Haiying	1
Maguire, Phil	1
Mark J. Gierl	1
Midyette, Jeb	1
Neale, Larry	1
Ole J. Kemi	1
On-Soon Lee	1
More ▼