NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 1 to 15 of 84 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025
In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…
Descriptors: Automation, Grading, Computer Assisted Testing, Scoring
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Peer reviewed Peer reviewed
Direct linkDirect link
Jennifer Sdunzik; Ann M. Bessenbacher; Wilella D. Burgess; Asia M. Mohamud; Abdirisak Dalmar – American Journal of Evaluation, 2025
The success of development projects and evaluations hinges on having access to research protocols and methodologies that consider the needs and characteristics of stakeholders, subjects, and context while remaining rigorous and culturally sound. These efforts are often complicated by a dearth of tools that have been tested for validity and…
Descriptors: Foreign Countries, Program Evaluation, International Programs, Data Collection
Peer reviewed Peer reviewed
Direct linkDirect link
Wolkowitz, Amanda A. – Journal of Educational Measurement, 2021
Decision consistency (DC) is the reliability of a classification decision based on a test score. In professional credentialing, the decision is often a high-stakes pass/fail decision. The current methods for estimating DC are computationally complex. The purpose of this research is to provide a computationally and conceptually simple method for…
Descriptors: Decision Making, Reliability, Classification, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Karel Kok; Sophia Chroszczinsky; Burkhard Priemer – Physical Review Physics Education Research, 2024
Data comparison problems are used in teaching and science education research that focuses on students' ability to compare datasets and their conceptual understanding of measurement uncertainties. However, the evaluation of students' decisions in these problems can pose a problem: e.g., students making a correct decision for the wrong reasons.…
Descriptors: Secondary School Students, Undergraduate Students, Comparative Analysis, Evaluation Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025
Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…
Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction
Gill, Tim – Research Matters, 2022
In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…
Descriptors: Comparative Analysis, Decision Making, Scripts, Standards
National Center on Improving Literacy, 2022
There are many available screeners for reading and other education or social-emotional outcomes. This brief outlines important things to consider when choosing and using a screener.
Descriptors: Screening Tests, Literacy, Social Emotional Learning, Decision Making
Peer reviewed Peer reviewed
PDF on ERIC Download full text
W. Wisanti; Siti Zubaidah; Sri Rahayu Lestari; Novita Kartika Indah; Eva Kristinawati Putri – Journal of Biological Education Indonesia (Jurnal Pendidikan Biologi Indonesia), 2023
An identification key is one of the tools used to determine the identity of a plant specimen. This research aims to design an identification key for "M. crenata" and analyze its potential as an identification tool. The research uses an observational descriptive method. The identification keys were designed for populations growing in…
Descriptors: Biology, Science Instruction, Visual Aids, Plants (Botany)
Leech, Tony; Chambers, Lucy – Research Matters, 2022
Two of the central issues in comparative judgement (CJ), which are perhaps underexplored compared to questions of the method's reliability and technical quality, are "what processes do judges use to make their decisions" and "what features do they focus on when making their decisions?" This article discusses both, in the…
Descriptors: Comparative Analysis, Decision Making, Evaluators, Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022
The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…
Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Peer reviewed Peer reviewed
Direct linkDirect link
Mazzurco, Andrea; Jesiek, Brent K.; Godwin, Allison – Journal of Civil Engineering Education, 2020
Due to globalization trends, engineers are increasingly expected to work effectively across national and cultural boundaries. However, there remains a lack of valid and reliable measures of global engineering competency. To address this gap, the research team has undertaken a large-scale research project to develop a suite of instruments to…
Descriptors: Engineering Education, Decision Making, Measures (Individuals), Trend Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023
Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…
Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Marc T. Braverman – Journal of Human Sciences & Extension, 2019
This article examines the concept of credible evidence in Extension evaluations with specific attention to the measures and measurement strategies used to collect and create data. Credibility depends on multiple factors, including data quality and methodological rigor, characteristics of the stakeholder audience, stakeholder beliefs about the…
Descriptors: Extension Education, Program Evaluation, Evaluation Methods, Planning
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6