ERIC - Search Results

Publication Date

In 2025	2
Since 2024	4
Since 2021 (last 5 years)	8
Since 2016 (last 10 years)	13
Since 2006 (last 20 years)	23

Descriptor

Computer Assisted Testing	23
Interrater Reliability	23
Foreign Countries	12
Second Language Learning	9
Undergraduate Students	9
College Students	8
Computer Software	8
English (Second Language)	8
Evaluation Methods	7
Scoring	7
Scores	6
Test Reliability	6
Comparative Analysis	5
Educational Technology	5
Test Format	5
Accuracy	4
Artificial Intelligence	4
College Faculty	4
Evaluators	4
Grading	4
Language Proficiency	4
Scoring Rubrics	4
Student Evaluation	4
Test Validity	4
Writing Evaluation	4
More ▼

Publication Type

Journal Articles	22
Reports - Research	18
Reports - Evaluative	3
Tests/Questionnaires	3
Collected Works - Proceedings	1
Reports - Descriptive	1

Education Level

Higher Education	23
Postsecondary Education	23
High Schools	2
Secondary Education	2
Elementary Secondary Education	1

Audience

Location

Turkey	4
China	2
Singapore	2
South Korea	2
Taiwan	2
Asia	1
Australia	1
Brazil	1
China (Beijing)	1
Connecticut	1
Denmark	1
Egypt	1
Estonia	1
Florida	1
Germany	1
Greece	1
Hawaii	1
Ireland	1
Israel	1
Italy	1
Japan	1
Kazakhstan	1
Louisiana	1
Netherlands	1
Norway	1
More ▼

Laws, Policies, & Programs

Pell Grant Program

Assessments and Surveys

ACT Assessment	1
Graduate Management Admission…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 23 results Save | Export

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

Examining AI-Based Accuracy Assessment in L2 Learners' Writing

Peer reviewed

Direct link

On-Soon Lee – Journal of Pan-Pacific Association of Applied Linguistics, 2024

Despite the increasing interest in using AI tools as assistant agents in instructional settings, the effectiveness of ChatGPT, the generative pretrained AI, for evaluating the accuracy of second language (L2) writing has been largely unexplored in formative assessment. Therefore, the current study aims to examine how ChatGPT, as an evaluator,…

Descriptors: Foreign Countries, Undergraduate Students, English (Second Language), Second Language Learning

Establishing a Physics Concept Inventory Using Computer Marked Free-Response Questions

Peer reviewed
PDF on ERIC

Download full text

Parker, Mark A. J.; Hedgeland, Holly; Jordan, Sally E.; Braithwaite, Nicholas St. J. – European Journal of Science and Mathematics Education, 2023

The study covers the development and testing of the alternative mechanics survey (AMS), a modified force concept inventory (FCI), which used automatically marked free-response questions. Data were collected over a period of three academic years from 611 participants who were taking physics classes at high school and university level. A total of…

Descriptors: Test Construction, Scientific Concepts, Physics, Test Reliability

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

Developing and Validating a Computerized Oral Proficiency Test of English as a Foreign Language (COPTEFL)

Peer reviewed
PDF on ERIC

Download full text

Isler, Cemre; Aydin, Belgin – International Journal of Assessment Tools in Education, 2021

This study is about the development and validation process of the Computerized Oral Proficiency Test of English as a Foreign Language (COPTEFL). The test aims at assessing the speaking proficiency levels of students in Anadolu University School of Foreign Languages (AUSFL). For this purpose, three monologic tasks were developed based on the Global…

Descriptors: Test Construction, Construct Validity, Interrater Reliability, Scores

The Impact of Computers on Marking Behaviors and Assessment: A Many-Facet Rasch Measurement Analysis of Essays by EFL College Students

Peer reviewed

Direct link

He, Tung-hsien – SAGE Open, 2019

This study employed a mixed-design approach and the Many-Facet Rasch Measurement (MFRM) framework to investigate whether rater bias occurred between the onscreen scoring (OSS) mode and the paper-based scoring (PBS) mode. Nine human raters analytically marked scanned scripts and paper scripts using a six-category (i.e., six-criterion) rating…

Descriptors: Computer Assisted Testing, Scoring, Item Response Theory, Essays

Comparison of Automatic and Expert Teachers' Rating of Computerized English Listening-Speaking Test

Peer reviewed
PDF on ERIC

Download full text

Linlin, Cao – English Language Teaching, 2020

Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…

Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning

The Effects of Proficiency and Study-Abroad on Chinese EFL Learners' Refusals

Peer reviewed

Direct link

Wang, Yuqi; Ren, Wei – Language Learning Journal, 2022

L2 pragmatics have explored the effects of different factors on different aspects of learners' pragmatic performance, but often not simultaneously. In addition, syntactic complexity is rarely examined in L2 pragmatics. This cross-sectional study aimed to conduct a multidimensional analysis to explore the effects of proficiency and study-abroad…

Descriptors: Pragmatics, Second Language Learning, Second Language Instruction, English (Second Language)

Computer-Based and Paper-and-Pencil Tests: A Study in Calculus for STEM Majors

Peer reviewed

Direct link

Smolinsky, Lawrence; Marx, Brian D.; Olafsson, Gestur; Ma, Yanxia A. – Journal of Educational Computing Research, 2020

Computer-based testing is an expanding use of technology offering advantages to teachers and students. We studied Calculus II classes for science, technology, engineering, and mathematics majors using different testing modes. Three sections with 324 students employed: paper-and-pencil testing, computer-based testing, and both. Computer tests gave…

Descriptors: Test Format, Computer Assisted Testing, Paper (Material), Calculus

Variations in Rating Scale Functioning in Assessing Speech Act Production in L2 Chinese

Peer reviewed

Direct link

Li, Shuai; Taguchi, Naoko; Xiao, Feng – Language Assessment Quarterly, 2019

Adopting Linacre's guidelines for evaluating rating scale effectiveness, we examined whether and how a six-point rating scale functioned differently across raters, speech acts, and second language (L2) proficiency levels. We developed a 12-item Computerized Oral Discourse Completion Task (CODCT) for assessing the production of requests, refusals,…

Descriptors: Speech Acts, Rating Scales, Guidelines, Evaluators

WordBytes: Exploring an Intermediate Constraint Format for Rapid Classification of Student Answers on Constructed Response Assessments

Peer reviewed
PDF on ERIC

Download full text

Kim, Kerry J.; Meir, Eli; Pope, Denise S.; Wendel, Daniel – Journal of Educational Data Mining, 2017

Computerized classification of student answers offers the possibility of instant feedback and improved learning. Open response (OR) questions provide greater insight into student thinking and understanding than more constrained multiple choice (MC) questions, but development of automated classifiers is more difficult, often requiring training a…

Descriptors: Classification, Computer Assisted Testing, Multiple Choice Tests, Test Format

Integrating Viewpoint and Space: How Lamination across Gesture, Body Movement, Language, and Material Resources Shapes Learning

Peer reviewed

Direct link

DeLiema, David; Enyedy, Noel; Steen, Francis; Danish, Joshua A. – Cognition and Instruction, 2021

Gesture is recognized as part of and integral to cognition. The value of gesture for learning is contingent on how it gathers meaning against the ground of other relevant resources in the setting--in short, how the body is laminated onto the surrounding environment. With a focus on lamination, this paper formulates an integrated theory of…

Descriptors: Nonverbal Communication, Human Body, Schemata (Cognition), Spatial Ability

Development of a Rubric to Assess Academic Writing Incorporating Plagiarism Detectors

Peer reviewed

Direct link

Razi, Salim – SAGE Open, 2015

Similarity reports of plagiarism detectors should be approached with caution as they may not be sufficient to support allegations of plagiarism. This study developed a 50-item rubric to simplify and standardize evaluation of academic papers. In the spring semester of 2011-2012 academic year, 161 freshmen's papers at the English Language Teaching…

Descriptors: Foreign Countries, Scoring Rubrics, Writing Evaluation, Writing (Composition)

Developing Analytic Rating Guides for "TOEFL iBT"® Integrated Speaking Tasks. "TOEFL iBT"® Research Report, TOEFL iBT-20. ETS Research Report. RR-13-13

Peer reviewed
PDF on ERIC

Download full text

Jamieson, Joan; Poonpon, Kornwipa – ETS Research Report Series, 2013

Research and development of a new type of scoring rubric for the integrated speaking tasks of "TOEFL iBT"® are described. These "analytic rating guides" could be helpful if tasks modeled after those in TOEFL iBT were used for formative assessment, a purpose which is different from TOEFL iBT's primary use for admission…

Descriptors: Oral Language, Language Proficiency, Scaling, Scores

Previous Page | Next Page »

Pages: 1 | 2

Journal of Educational…	2
SAGE Open	2
ALT-J: Research in Learning…	1
Advances in Physiology…	1
British Educational Research…	1
Cognition and Instruction	1
ETS Research Report Series	1
English Language Teaching	1
European Journal of Science…	1
International Association for…	1
International Journal of…	1
International Journal of…	1
Journal of Computer Assisted…	1
Journal of Educational Data…	1
Journal of Educational…	1
Journal of Pan-Pacific…	1
Journal of Technology,…	1
Language Assessment Quarterly	1
Language Learning Journal	1
Teaching in Higher Education	1
Turkish Online Journal of…	1
More ▼

Clariana, Roy B.	2
Amanda Huee-Ping Wong	1
Aydin, Belgin	1
Aydin, Selami	1
Braithwaite, Nicholas St. J.	1
Burk, John	1
Danish, Joshua A.	1
DeLiema, David	1
Enyedy, Noel	1
Garcia, Veronica	1
Hamid Mohammadi	1
He, Tung-hsien	1
Hedgeland, Holly	1
Isler, Cemre	1
Ivan Cherh Chiet Low	1
Jamieson, Joan	1
Jonas Flodén	1
Joordens, S.	1
Jordan, Sally E.	1
Kim, Kerry J.	1
Koul, Ravinder	1
Li, Shuai	1
Linlin, Cao	1
Ma, Yanxia A.	1
Mark J. Gierl	1
More ▼