ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	9
Since 2017 (last 10 years)	14
Since 2007 (last 20 years)	21

Descriptor

Computer Software	25
Evaluation Methods	25
Evaluators	25
Comparative Analysis	12
Scoring	7
Foreign Countries	6
Language Tests	6
Artificial Intelligence	5
Computational Linguistics	5
Interrater Reliability	5
Reliability	5
Scores	5
Computer Assisted Testing	4
Decision Making	4
English (Second Language)	4
Second Language Learning	4
Accuracy	3
College Faculty	3
Computer Software Evaluation	3
Correlation	3
Elementary Secondary Education	3
Instructional Effectiveness	3
Second Languages	3
Classification	2
Computer Assisted Instruction	2
More ▼

Publication Type

Journal Articles	20
Reports - Research	17
Reports - Descriptive	3
Dissertations/Theses -…	2
Reports - Evaluative	2
Speeches/Meeting Papers	2
Collected Works - Proceedings	1
Tests/Questionnaires	1

Education Level

Higher Education	4
Postsecondary Education	4
Early Childhood Education	2
Elementary Education	2
Elementary Secondary Education	2
Primary Education	2
Grade 1	1
Grade 2	1
Grade 7	1
High Schools	1
Junior High Schools	1
Kindergarten	1
Secondary Education	1
More ▼

Audience

Location

China	2
Algeria	1
Cuba	1
Denmark	1
India	1
Israel	1
Ohio	1
United Kingdom (London)	1
United States	1
Vietnam	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	2
Big Five Inventory	1
Flesch Kincaid Grade Level…	1
Fry Readability Formula	1
National Adult Literacy…	1
Torrance Tests of Creative…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 25 results Save | Export

Towards the Automatic Risk of Bias Assessment on Randomized Controlled Trials: A Comparison of RobotReviewer and Humans

Peer reviewed

Direct link

Yuan Tian; Xi Yang; Suhail A. Doi; Luis Furuya-Kanamori; Lifeng Lin; Joey S. W. Kwong; Chang Xu – Research Synthesis Methods, 2024

RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two…

Descriptors: Risk, Randomized Controlled Trials, Classification, Robotics

Automatic Wordnet Construction and Its Application in Generating Distractors for Cloze Questions

Direct link

Yicheng Sun – ProQuest LLC, 2024

We study how to automatically generate cloze questions from given texts to assess reading comprehension, where a cloze question consists of a stem with a blank space holder for the answer key, and three distractors for generating confusions. We present a generative method called CQG (Cloze Question Generator) for constructing cloze questions from…

Descriptors: Cloze Procedure, Reading Processes, Questioning Techniques, Computational Linguistics

Examining the Effect of Assessment Construct Characteristics on Machine Learning Scoring of Scientific Argumentation

Peer reviewed

Direct link

Kevin C. Haudek; Xiaoming Zhai – International Journal of Artificial Intelligence in Education, 2024

Argumentation, a key scientific practice presented in the "Framework for K-12 Science Education," requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging…

Descriptors: Accuracy, Persuasive Discourse, Artificial Intelligence, Learning Management Systems

Combining Human and Automated Scoring Methods in Experimental Assessments of Writing: A Case Study Tutorial

Peer reviewed
PDF on ERIC

Download full text

Direct link

Reagan Mozer; Luke Miratrix; Jackie Eunjung Relyea; James S. Kim – Journal of Educational and Behavioral Statistics, 2024

In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This…

Descriptors: Scoring, Evaluation Methods, Writing Evaluation, Comparative Analysis

How Do Judges in Comparative Judgement Exercises Make Their Judgements?

Download full text

Leech, Tony; Chambers, Lucy – Research Matters, 2022

Two of the central issues in comparative judgement (CJ), which are perhaps underexplored compared to questions of the method's reliability and technical quality, are "what processes do judges use to make their decisions" and "what features do they focus on when making their decisions?" This article discusses both, in the…

Descriptors: Comparative Analysis, Decision Making, Evaluators, Reliability

Crowdsourced Adaptive Comparative Judgment: A Community-Based Solution for Proficiency Rating

Peer reviewed

Direct link

Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022

The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…

Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability

Evaluating Creativity: How Idea Context and Rater Personality Affect Considerations of Novelty and Usefulness

Peer reviewed

Direct link

Lloyd-Cox, James; Pickering, Alan; Bhattacharya, Joydeep – Creativity Research Journal, 2022

According to the standard definition, creative ideas must be both novel and useful. While a handful of recent studies suggest that novelty is more important than usefulness to evaluations of creativity, little is known about the contextual and interpersonal factors that affect how people weigh these two components when making an overall creativity…

Descriptors: Creativity, Personality Traits, Decision Making, Evaluators

Comparing Machine and Human Reviewers to Evaluate the Risk of Bias in Randomized Controlled Trials

Peer reviewed

Direct link

Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020

Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…

Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials

An Introduction to the Analysis of Ranked Response Data

Peer reviewed
PDF on ERIC

Download full text

Finch, Holmes – Practical Assessment, Research & Evaluation, 2022

Researchers in many disciplines work with ranking data. This data type is unique in that it is often deterministic in nature (the ranks of items "k"-1 determine the rank of item "k"), and the difference in a pair of rank scores separated by "k" units is equivalent regardless of the actual values of the two ranks in…

Descriptors: Data Analysis, Statistical Inference, Models, College Faculty

Calibrated Parsing Items Evaluation: A Step towards Objectifying the Translation Assessment

Peer reviewed

Direct link

Akbari, Alireza; Shahnazari, Mohammadtaghi – Language Testing in Asia, 2019

The present research paper introduces a translation evaluation method called Calibrated Parsing Items Evaluation (CPIE hereafter). This evaluation method maximizes translators' performance through identifying the parsing items with an optimal p-docimology and d-index (item discrimination). This method checks all the possible parses (annotations)…

Descriptors: Test Items, Translation, Computer Software, Evaluators

Can Automated Machine Translation Evaluation Metrics Be Used to Assess Students' Interpretation in the Language Learning Classroom?

Peer reviewed

Direct link

Han, Chao; Lu, Xiaolei – Computer Assisted Language Learning, 2023

The use of translation and interpreting (T&I) in the language learning classroom is commonplace, serving various pedagogical and assessment purposes. Previous utilization of T&I exercises is driven largely by their potential to enhance language learning, whereas the latest trend has begun to underscore T&I as a crucial skill to be…

Descriptors: Translation, Computational Linguistics, Correlation, Language Processing

Fuzzy Logic Applied for Pronunciation Assessment

Peer reviewed

Direct link

Bahi, Halima; Necibi, Khaled – International Journal of Computer-Assisted Language Learning and Teaching, 2020

Pronunciation teaching is an important stage in language learning activities. This article tackles the pronunciation scoring problem where research has demonstrated relatively low human-human and low human-machine agreement rates, which makes teachers skeptical about their relevance. To overcome these limitations, a fuzzy combination of two…

Descriptors: Oral Reading, Reading Fluency, Pronunciation, Learning Activities

Modeling Creativity in Visual Programming: From Theory to Practice

Peer reviewed
PDF on ERIC

Download full text

Kovalkov, Anastasia; Paassen, Benjamin; Segal, Avi; Gal, Kobi; Pinkwart, Niels – International Educational Data Mining Society, 2021

Promoting creativity is considered an important goal of education, but creativity is notoriously hard to define and measure. In this paper, we make the journey from defining a formal creativity and applying the measure in a practical domain. The measure relies on core theoretical concepts in creativity theory, namely fluency, flexibility, and…

Descriptors: Creativity, Theory Practice Relationship, Evaluators, Specialists

Comparison of Automatic and Expert Teachers' Rating of Computerized English Listening-Speaking Test

Peer reviewed
PDF on ERIC

Download full text

Linlin, Cao – English Language Teaching, 2020

Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…

Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning

Automated Assessment of Reviews

Direct link

Ramachandran, Lakshmi – ProQuest LLC, 2013

Relevance helps identify to what extent a review's content pertains to that of the submission. Relevance metric helps distinguish generic or vague reviews from the useful ones. Relevance of a review to a submission can be determined by identifying semantic and syntactic similarities between them. Our work introduces the use of a word-order graph…

Descriptors: Evaluation, Evaluators, Semantics, Word Order

Previous Page | Next Page »

Pages: 1 | 2

ETS Research Report Series	2
ProQuest LLC	2
Research Synthesis Methods	2
Behaviour & Information…	1
Computer Assisted Language…	1
Creativity Research Journal	1
English Language Teaching	1
Evaluation and Program…	1
International Educational…	1
International Journal of…	1
International Journal of…	1
Journal of Educational and…	1
Journal of MultiDisciplinary…	1
Language Learning	1
Language Testing in Asia	1
Mid-Western Educational…	1
Multivariate Behavioral…	1
Practical Assessment,…	1
Research Matters	1
Social Work Research	1
More ▼

Abedi, Jamal	1
Akbari, Alireza	1
Armijo-Olivo, Susan	1
Bahi, Halima	1
Barrett, Andrew J.	1
Bejar, Isaac I.	1
Beltyukova, Svetlana A.	1
Bhattacharya, Joydeep	1
Bridgeman, Brent	1
Campbell, Sandy	1
Chambers, Lucy	1
Chang Xu	1
Craig, Rodger	1
Davey, Tim	1
Finch, Holmes	1
Flores, Kathryn Younger	1
Fox, Christine M.	1
Gal, Kobi	1
Han, Chao	1
Hemat, Ramin	1
Israel, Nathaniel	1
Jackie Eunjung Relyea	1
James S. Kim	1
Joey S. W. Kwong	1
More ▼