Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 8 |
Since 2016 (last 10 years) | 15 |
Since 2006 (last 20 years) | 24 |
Descriptor
Evaluation Methods | 31 |
Evaluators | 31 |
Scores | 31 |
Interrater Reliability | 10 |
Comparative Analysis | 8 |
Foreign Countries | 8 |
Reliability | 6 |
Scoring | 6 |
Second Language Learning | 6 |
Teacher Evaluation | 6 |
Computer Software | 5 |
More ▼ |
Source
Author
Bocala, Candice | 2 |
Chang, Quincy | 2 |
Lacireno-Paquet, Natalie | 2 |
Riordan, Julie | 2 |
Shakman, Karen | 2 |
Abedi, Jamal | 1 |
Attali, Yigal | 1 |
Bazeley, Patricia | 1 |
Beltyukova, Svetlana A. | 1 |
Cooper, Harris | 1 |
Davis, Stephen | 1 |
More ▼ |
Publication Type
Education Level
Audience
Policymakers | 2 |
Parents | 1 |
Researchers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 1 |
Torrance Tests of Creative… | 1 |
What Works Clearinghouse Rating
Attali, Yigal – ETS Research Report Series, 2020
Principles of skill acquisition dictate that raters should be provided with frequent feedback about their ratings. However, in current operational practice, raters rarely receive immediate feedback about their scores owing to the prohibitive effort required to generate such feedback. An approach for generating and administering feedback responses…
Descriptors: Feedback (Response), Evaluators, Accuracy, Scores
Jamie L. Thompson – ProQuest LLC, 2023
Research indicates that teacher performance is a critical focus for school districts, administrators, and teachers. Pre-service teacher preparation, teacher retention, job satisfaction, mentoring, continuous feedback, and onboarding support for new teachers are all factors that influence teacher performance (Carver-Thomas & Darling-Hammond,…
Descriptors: Teacher Evaluation, Teacher Effectiveness, Evaluation Methods, Feedback (Response)
Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022
The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…
Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability
Nagle, Charlie L.; Trofimovich, Pavel; O'Brien, Mary Grantham; Kennedy, Sara – Modern Language Journal, 2022
Comprehensibility has emerged as a useful and intuitive means of globally evaluating second language (L2) speakers in many research and instructional contexts. In most cases, L2 speakers' comprehensibility is assessed by external listeners who do not engage in extensive communication with the speakers, even though the degree to which a speaker is…
Descriptors: Evaluators, Intelligibility, Pronunciation, Task Analysis
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Gingerich, Andrea; Schokking, Edward; Yeates, Peter – Advances in Health Sciences Education, 2018
Recent literature places more emphasis on assessment comments rather than relying solely on scores. Both are variable, however, emanating from assessment judgements. One established source of variability is "contrast effects": scores are shifted away from the depicted level of competence in a preceding encounter. The shift could arise…
Descriptors: Evaluation Methods, Scores, Intervention, Schemata (Cognition)
Karusoo-Musumeci, Ava; Pearce, Wendy M.; Donaghy, Michelle – Child Language Teaching and Therapy, 2022
Oral narrative assessments are important for diagnosis of language disorders in school-age children so scoring needs to be reliable and consistent. This study explored the impact of training on the variability of story grammar scores in children's oral narrative assessments scored by multiple raters. Fifty-one speech pathologists and 19 final-year…
Descriptors: Oral Language, Speech Evaluation, Language Impairments, Elementary School Students
Thai, Thuy; Sheehan, Susan – Language Education & Assessment, 2022
In language performance tests, raters are important as their scoring decisions determine which aspects of performance the scores represent; however, raters are considered as one of the potential sources contributing to unwanted variability in scores (Davis, 2012). Although a great number of studies have been conducted to unpack how rater…
Descriptors: Rating Scales, Speech Communication, Second Language Learning, Second Language Instruction
Griesbach, Jan – ProQuest LLC, 2019
The purpose of this quantitative, repeated measure, ex-post facto study was to investigate the trajectory of principals' ability to develop and maintain the knowledge to accurately rate teaching practices in eight of ten observable components on the "Framework for Teaching" (Danielson, 2013). Principals' scores on initial proficiency…
Descriptors: Principals, Reliability, Teacher Evaluation, Scores
Seth B. Hunter; Matthew P. Steinberg – Annenberg Institute for School Reform at Brown University, 2024
Districts nationwide have increased the frequency of teacher evaluations. Yet, we know little about the role of evaluator feedback for teacher improvement. Using unique classroom observation-level data, we use evaluator ratings and teacher self-assessments of teacher performance to rigorously examine (positive and negative) feedback valence from…
Descriptors: Teacher Evaluation, Evaluation Methods, Feedback (Response), Teacher Improvement
Kovalkov, Anastasia; Paassen, Benjamin; Segal, Avi; Gal, Kobi; Pinkwart, Niels – International Educational Data Mining Society, 2021
Promoting creativity is considered an important goal of education, but creativity is notoriously hard to define and measure. In this paper, we make the journey from defining a formal creativity and applying the measure in a practical domain. The measure relies on core theoretical concepts in creativity theory, namely fluency, flexibility, and…
Descriptors: Creativity, Theory Practice Relationship, Evaluators, Specialists
Linlin, Cao – English Language Teaching, 2020
Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…
Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Zhang, Bo; Xiao, Yunnan; Luo, Juan – Language Testing in Asia, 2015
Previous studies comparing holistic scoring to analytic scoring of second language writing have given mixed results. Some of them suffer from methodological drawbacks, such as limited writing sample size, limited number of raters, and lack of direct comparison of the two methods. Based on 300 writing samples graded by 14 raters, this research…
Descriptors: Evaluators, Reliability, Scores, Holistic Approach
Szarkowska, Agnieszka; Krejtz, Krzysztof; Dutka, Lukasz; Pilipczuk, Olga – Interpreter and Translator Trainer, 2018
In this study, we examined whether interpreters and interpreting trainees are better predisposed to respeaking than people with no interpreting skills. We tested 57 participants (22 interpreters, 23 translators and 12 controls) while respeaking 5-minute videos with two parameters: speech rate (fast/slow) and number of speakers (one/many). Having…
Descriptors: Translation, Comparative Analysis, Professional Personnel, Video Technology