Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 56 |
Since 2006 (last 20 years) | 112 |
Descriptor
Evaluators | 131 |
Statistical Analysis | 131 |
Foreign Countries | 52 |
Second Language Learning | 50 |
English (Second Language) | 45 |
Comparative Analysis | 33 |
Correlation | 33 |
Second Language Instruction | 29 |
Evaluation Methods | 28 |
Language Tests | 28 |
Interrater Reliability | 26 |
More ▼ |
Source
Author
Publication Type
Education Level
Audience
Researchers | 3 |
Location
Iran | 6 |
Australia | 4 |
Canada | 4 |
Hong Kong | 4 |
China | 3 |
Japan | 3 |
Turkey | 3 |
United Kingdom | 3 |
California | 2 |
Europe | 2 |
Georgia | 2 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Ilhan, Mustafa – International Journal of Assessment Tools in Education, 2019
This study investigated the effectiveness of statistical adjustments applied to rater bias in many-facet Rasch analysis. Some changes were first made in the dataset that did not include "rater × examinee" bias to cause to have "rater × examinee" bias. Later, bias adjustment was applied to rater bias included in the data file,…
Descriptors: Statistical Analysis, Item Response Theory, Evaluators, Bias
Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018
The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…
Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability
Conger, Anthony J. – Educational and Psychological Measurement, 2017
Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…
Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis
Rettore, Enrico; Rocco, Lorenzo; Dal Maso, Carlo – Education Economics, 2018
We evaluate two reforms that modified the procedures of recruitment and promotion in Italian academia to balance the preeminent role of the recruiting school and to counter nepotism. We theoretically derive the decision rule of the evaluation committees and test it against data including information from all selections to associate and full…
Descriptors: Foreign Countries, College Faculty, Faculty Promotion, Teacher Recruitment
White, Lisa – ProQuest LLC, 2017
Although used in the corporate world for decades, using a multi-rater tool to evaluate school leaders began relatively recently. With states seeking flexibility from the "Elementary and Secondary Education Act of 1965" (reauthorized as the "No Child Left Behind Act of 2001"), the requirement to develop and implement principal…
Descriptors: Principals, Administrator Evaluation, Surveys, Self Evaluation (Individuals)
Hicks, Tyler; Rodríguez-Campos, Liliana; Choi, Jeong Hoon – American Journal of Evaluation, 2018
To begin statistical analysis, Bayesians quantify their confidence in modeling hypotheses with priors. A prior describes the probability of a certain modeling hypothesis apart from the data. Bayesians should be able to defend their choice of prior to a skeptical audience. Collaboration between evaluators and stakeholders could make their choices…
Descriptors: Bayesian Statistics, Evaluation Methods, Statistical Analysis, Hypothesis Testing
Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018
The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…
Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators
Lamprianou, Iasonas – Educational and Psychological Measurement, 2018
It is common practice for assessment programs to organize qualifying sessions during which the raters (often known as "markers" or "judges") demonstrate their consistency before operational rating commences. Because of the high-stakes nature of many rating activities, the research community tends to continuously explore new…
Descriptors: Social Networks, Network Analysis, Comparative Analysis, Innovation
Yun, Jiyeo – ProQuest LLC, 2017
Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…
Descriptors: Interrater Reliability, Essays, Scoring, Evaluators
Morris, Darrell; Pennell, Ashley M.; Perney, Jan; Trathen, Woodrow – Reading Psychology, 2018
This study compared reading rate to reading fluency (as measured by a rating scale). After listening to first graders read short passages, we assigned an overall fluency rating (low, average, or high) to each reading. We then used predictive discriminant analyses to determine which of five measures--accuracy, rate (objective); accuracy, phrasing,…
Descriptors: Reading Fluency, Prediction, Grade 1, Elementary School Students
Ebuoh, Casmir N. – World Journal of Education, 2018
Literature revealed that the patterns/methods of scoring essay tests had been criticized for not being reliable and this unreliability is more likely to be more in internal examinations than in the external examinations. The purpose of this study is to find out the effects of analytical and holistic scoring patterns on scorer reliability in…
Descriptors: Holistic Approach, Scoring, Essay Tests, Biology
Al-Harthi, Aisha Salim Ali; Campbell, Chris; Karimi, Arafeh – Computers in the Schools, 2018
This study aimed to develop, validate, and trial a rubric for evaluating the cloud-based learning designs (CBLD) that were developed by teachers using virtual learning environments. The rubric was developed using the technological pedagogical content knowledge (TPACK) framework, with rubric development including content and expert validation of…
Descriptors: Computer Assisted Instruction, Scoring Rubrics, Interrater Reliability, Content Validity
Silvey, Brian A.; Wacker, Aaron T.; Felder, Logan – International Journal of Music Education, 2017
The purpose of this study was to investigate the effects of baton usage on college musicians' perceptions of ensemble performance. Two conductors were videotaped while conducting a 1-minute excerpt from either a technical ("Pathfinder of Panama," John Philip Sousa) or lyrical ("Seal Lullaby," Eric Whitacre) piece of concert…
Descriptors: Musicians, College Students, Student Attitudes, Music Activities
Watson, Mary Katherine; Barrella, Elise; Wall, Thomas A.; Noyes, Caroline; Rodgers, Michael – Advances in Engineering Education, 2017
As engineering programs have begun to infuse sustainability into their undergraduate curricula, assessment tools are needed to further inform these reform efforts. The goal of this project was to demonstrate the use of a new rubric to examine students' abilities to engage in sustainable design. The rubric includes 16 sustainable design criteria…
Descriptors: Design, Sustainable Development, Engineering Education, Scoring Rubrics
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators