NotesFAQContact Us
Collection
Advanced
Search Tips
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 53 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Hunter, Seth B. – Journal of Education Human Resources, 2023
Teacher performance scores inform education leaders' management of teacher human resources. However, prior research has implied that different interpretations of performance criteria between teachers and their evaluators suppress teacher development. Although research has examined teacher perceptions of performance scores and compared teacher…
Descriptors: Teacher Evaluation, Teacher Effectiveness, Self Evaluation (Individuals), Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Peer reviewed Peer reviewed
Direct linkDirect link
Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023
Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…
Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication
Peer reviewed Peer reviewed
Direct linkDirect link
Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020
Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…
Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Bosch, Nigel; Paquette, Luc – Journal of Learning Analytics, 2018
Metrics including Cohen's kappa, precision, recall, and F[subscript 1] are common measures of performance for models of discrete student states, such as a student's affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on…
Descriptors: Models, Comparative Analysis, Prediction, Probability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Sumner, Josh – Research-publishing.net, 2021
Comparative Judgement (CJ) has emerged as a technique that typically makes use of holistic judgement to assess difficult-to-specify constructs such as production (speaking and writing) in Modern Foreign Languages (MFL). In traditional approaches, markers assess candidates' work one-by-one in an absolute manner, assigning scores to different…
Descriptors: Holistic Approach, Student Evaluation, Comparative Analysis, Decision Making
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Lanah Stafford; Erin Cousins; Linda Bol; Megan Mize – Research & Practice in Assessment, 2023
Integrative learning is an important outcome for graduates of higher education. Therefore, it should be well-defined and assessed reliably. The American Association of Colleges & Universities has developed a rubric to define and assess integrative learning, but it has low reliability. This pilot study examines whether this rubric's reliability…
Descriptors: Scoring Rubrics, Reliability, Evaluation Methods, Faculty Development
Peer reviewed Peer reviewed
Direct linkDirect link
Leaman, Marion C.; Edmonds, Lisa A. – Journal of Speech, Language, and Hearing Research, 2021
Purpose: This study evaluated interrater reliability (IRR) and test-retest stability (TRTS) of seven linguistic measures (percent correct information units, relevance, subject-verb-[object], complete utterance, grammaticality, referential cohesion, global coherence), and communicative success in unstructured conversation and in a story narrative…
Descriptors: Aphasia, Psychometrics, Correlation, Speech Language Pathology
Peer reviewed Peer reviewed
Direct linkDirect link
Guo, Xiuyan; Lei, Pui-Wa – International Journal of Testing, 2020
Little research has been done on the effects of peer raters' quality characteristics on peer rating qualities. This study aims to address this gap and investigate the effects of key variables related to peer raters' qualities, including content knowledge, previous rating experience, training on rating tasks, and rating motivation. In an experiment…
Descriptors: Peer Evaluation, Error Patterns, Correlation, Knowledge Level
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Linlin, Cao – English Language Teaching, 2020
Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…
Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning
Peer reviewed Peer reviewed
Direct linkDirect link
Robertson, Clare; Ramsay, Craig; Gurung, Tara; Mowatt, Graham; Pickard, Robert; Sharma, Pawana – Research Synthesis Methods, 2014
We describe our experience of using a modified version of the Cochrane risk of bias (RoB) tool for randomised and non-randomised comparative studies. Objectives: (1) To assess time to complete RoB assessment; (2) To assess inter-rater agreement; and (3) To explore the association between RoB and treatment effect size. Methods: Cochrane risk of…
Descriptors: Risk, Randomized Controlled Trials, Research Design, Comparative Analysis
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Schleigh, Sharon Price; Clark, Douglas B.; Menekse, Muhsin – International Journal of Education in Mathematics, Science and Technology, 2015
Although interview formats support rich data collection in conceptual change studies, interview formats limit sample sizes. This study explores the possibility of using constructed-response formats as an alternative or supplement for collecting similarly rich data across larger pools of subjects in conceptual change studies. While research in…
Descriptors: Interviews, Sample Size, Change, Concept Formation
Peer reviewed Peer reviewed
Direct linkDirect link
Heldsinger, Sandra A.; Humphry, Stephen M. – Educational Research, 2013
Background: Many in education argue for the importance of incorporating teacher judgements in the assessment and reporting of student performance. Advocates of such an approach are cognisant, though, that obtaining a satisfactory level of consistency in teacher judgements poses a challenge. Purpose: This study investigates the extent to which the…
Descriptors: Evaluation Methods, Student Evaluation, Teacher Attitudes, Comparative Analysis
Collier, Lizabeth C. – ProQuest LLC, 2014
This study investigates how university instructors from various disciplines at a large, comprehensive university in the United States evaluate different varieties of English from countries considered "outer circle" (OC) countries, formerly colonized countries where English has been transplanted and is now used unofficially and officially…
Descriptors: Universities, Global Approach, College English, Writing Evaluation
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4