NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Does not meet standards1
Showing 1 to 15 of 327 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Kapsner-Smith, Mara R.; Opuszynski, Amanda; Stepp, Cara E.; Eadie, Tanya L. – Journal of Speech, Language, and Hearing Research, 2021
Purpose: The reliability of auditory-perceptual judgments between listeners is a long-standing problem in the assessment of voice disorders. The purpose of this study was to determine whether a relatively novel experimental scaling method, called visual sort and rate (VSR), yielded stronger reliability than the more frequently used method of…
Descriptors: Voice Disorders, Interrater Reliability, Rating Scales, Severity (of Disability)
Peer reviewed Peer reviewed
Direct linkDirect link
Taylor, Tessa; Lanovaz, Marc J. – Journal of Applied Behavior Analysis, 2022
Behavior analysts typically rely on visual inspection of single-case experimental designs to make treatment decisions. However, visual inspection is subjective, which has led to the development of supplemental objective methods such as the conservative dual-criteria method. To replicate and extend a study conducted by Wolfe et al. (2018) on the…
Descriptors: Visual Perception, Artificial Intelligence, Decision Making, Evaluators
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025
As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…
Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022
In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…
Descriptors: Evaluators, Bias, Identification, Performance Based Assessment
Peer reviewed Peer reviewed
Direct linkDirect link
Hunter, Seth B. – Journal of Education Human Resources, 2023
Teacher performance scores inform education leaders' management of teacher human resources. However, prior research has implied that different interpretations of performance criteria between teachers and their evaluators suppress teacher development. Although research has examined teacher perceptions of performance scores and compared teacher…
Descriptors: Teacher Evaluation, Teacher Effectiveness, Self Evaluation (Individuals), Interrater Reliability
Heather Raithel – ProQuest LLC, 2023
A mixed methods action research study was designed to answer three research questions based on inter-rater reliability (IRR) in compliance calls for transition at a state education agency, perceived confidence levels in making and discussing compliance calls, and perceived confidence in sharing transition resources. An innovation based on…
Descriptors: Public Agencies, Interrater Reliability, Compliance (Legal), Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Whalen, Kate; Paez, Antonio – Journal of Geography, 2022
Experiential education partnered with guided reflection is thought to support students with higher-order thinking skills. In this study, 44 reflections from two university-level sustainability courses were compared. In both courses students were asked to write a reflection, but only one course used the Reflective Learning Framework (RLF). Tests of…
Descriptors: Geography Instruction, Thinking Skills, Experiential Learning, Sustainability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Tülübas, Tijen; Demirkol, Murat; Ozdemir, Tuncay Yavuz; Polat, Hakan; Karakose, Turgut; Yirci, Ramazan – Educational Process: International Journal, 2023
Background/purpose: ChatGPT, a recent form of AI-based language model, have garnered interest among people from diverse backgrounds with its immersive capabilities. Using ChatGPT to support or generate scientific research has also created an ongoing debate over its advantages versus risks. The present study aimed to conduct an AI-enabled research…
Descriptors: Artificial Intelligence, Emergency Programs, Distance Education, COVID-19
Peer reviewed Peer reviewed
Direct linkDirect link
De Raadt, Alexandra; Warrens, Matthijs J.; Bosker, Roel J.; Kiers, Henk A. L. – Educational and Psychological Measurement, 2019
Cohen's kappa coefficient is commonly used for assessing agreement between classifications of two raters on a nominal scale. Three variants of Cohen's kappa that can handle missing data are presented. Data are considered missing if one or both ratings of a unit are missing. We study how well the variants estimate the kappa value for complete data…
Descriptors: Interrater Reliability, Data, Statistical Analysis, Statistical Bias
Jiyeo Yun – English Teaching, 2023
Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…
Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024
This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…
Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Arslan Mancar, Sinem; Gulleroglu, H. Deniz – International Journal of Assessment Tools in Education, 2022
The aim of this study is to analyse the importance of the number of raters and compare the results obtained by techniques based on Classical Test Theory (CTT) and Generalizability (G) Theory. The Kappa and Krippendorff alpha techniques based on CTT were used to determine the inter-rater reliability. In this descriptive research data consists of…
Descriptors: Comparative Analysis, Interrater Reliability, Advanced Placement, Scoring Rubrics
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020
In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…
Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level
Peer reviewed Peer reviewed
Direct linkDirect link
Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023
Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…
Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  22