NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 25 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Yuan Tian; Xi Yang; Suhail A. Doi; Luis Furuya-Kanamori; Lifeng Lin; Joey S. W. Kwong; Chang Xu – Research Synthesis Methods, 2024
RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two…
Descriptors: Risk, Randomized Controlled Trials, Classification, Robotics
Yicheng Sun – ProQuest LLC, 2024
We study how to automatically generate cloze questions from given texts to assess reading comprehension, where a cloze question consists of a stem with a blank space holder for the answer key, and three distractors for generating confusions. We present a generative method called CQG (Cloze Question Generator) for constructing cloze questions from…
Descriptors: Cloze Procedure, Reading Processes, Questioning Techniques, Computational Linguistics
Peer reviewed Peer reviewed
Direct linkDirect link
Kevin C. Haudek; Xiaoming Zhai – International Journal of Artificial Intelligence in Education, 2024
Argumentation, a key scientific practice presented in the "Framework for K-12 Science Education," requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging…
Descriptors: Accuracy, Persuasive Discourse, Artificial Intelligence, Learning Management Systems
Reagan Mozer; Luke Miratrix; Jackie Eunjung Relyea; James S. Kim – Journal of Educational and Behavioral Statistics, 2024
In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This…
Descriptors: Scoring, Evaluation Methods, Writing Evaluation, Comparative Analysis
Leech, Tony; Chambers, Lucy – Research Matters, 2022
Two of the central issues in comparative judgement (CJ), which are perhaps underexplored compared to questions of the method's reliability and technical quality, are "what processes do judges use to make their decisions" and "what features do they focus on when making their decisions?" This article discusses both, in the…
Descriptors: Comparative Analysis, Decision Making, Evaluators, Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022
The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…
Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Lloyd-Cox, James; Pickering, Alan; Bhattacharya, Joydeep – Creativity Research Journal, 2022
According to the standard definition, creative ideas must be both novel and useful. While a handful of recent studies suggest that novelty is more important than usefulness to evaluations of creativity, little is known about the contextual and interpersonal factors that affect how people weigh these two components when making an overall creativity…
Descriptors: Creativity, Personality Traits, Decision Making, Evaluators
Peer reviewed Peer reviewed
Direct linkDirect link
Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020
Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…
Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Finch, Holmes – Practical Assessment, Research & Evaluation, 2022
Researchers in many disciplines work with ranking data. This data type is unique in that it is often deterministic in nature (the ranks of items "k"-1 determine the rank of item "k"), and the difference in a pair of rank scores separated by "k" units is equivalent regardless of the actual values of the two ranks in…
Descriptors: Data Analysis, Statistical Inference, Models, College Faculty
Peer reviewed Peer reviewed
Direct linkDirect link
Akbari, Alireza; Shahnazari, Mohammadtaghi – Language Testing in Asia, 2019
The present research paper introduces a translation evaluation method called Calibrated Parsing Items Evaluation (CPIE hereafter). This evaluation method maximizes translators' performance through identifying the parsing items with an optimal p-docimology and d-index (item discrimination). This method checks all the possible parses (annotations)…
Descriptors: Test Items, Translation, Computer Software, Evaluators
Peer reviewed Peer reviewed
Direct linkDirect link
Han, Chao; Lu, Xiaolei – Computer Assisted Language Learning, 2023
The use of translation and interpreting (T&I) in the language learning classroom is commonplace, serving various pedagogical and assessment purposes. Previous utilization of T&I exercises is driven largely by their potential to enhance language learning, whereas the latest trend has begun to underscore T&I as a crucial skill to be…
Descriptors: Translation, Computational Linguistics, Correlation, Language Processing
Peer reviewed Peer reviewed
Direct linkDirect link
Bahi, Halima; Necibi, Khaled – International Journal of Computer-Assisted Language Learning and Teaching, 2020
Pronunciation teaching is an important stage in language learning activities. This article tackles the pronunciation scoring problem where research has demonstrated relatively low human-human and low human-machine agreement rates, which makes teachers skeptical about their relevance. To overcome these limitations, a fuzzy combination of two…
Descriptors: Oral Reading, Reading Fluency, Pronunciation, Learning Activities
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Kovalkov, Anastasia; Paassen, Benjamin; Segal, Avi; Gal, Kobi; Pinkwart, Niels – International Educational Data Mining Society, 2021
Promoting creativity is considered an important goal of education, but creativity is notoriously hard to define and measure. In this paper, we make the journey from defining a formal creativity and applying the measure in a practical domain. The measure relies on core theoretical concepts in creativity theory, namely fluency, flexibility, and…
Descriptors: Creativity, Theory Practice Relationship, Evaluators, Specialists
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Linlin, Cao – English Language Teaching, 2020
Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…
Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning
Ramachandran, Lakshmi – ProQuest LLC, 2013
Relevance helps identify to what extent a review's content pertains to that of the submission. Relevance metric helps distinguish generic or vague reviews from the useful ones. Relevance of a review to a submission can be determined by identifying semantic and syntactic similarities between them. Our work introduces the use of a word-order graph…
Descriptors: Evaluation, Evaluators, Semantics, Word Order
Previous Page | Next Page ยป
Pages: 1  |  2