Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 52 |
Since 2006 (last 20 years) | 100 |
Descriptor
Source
Author
Publication Type
Journal Articles | 109 |
Reports - Research | 93 |
Tests/Questionnaires | 14 |
Reports - Evaluative | 11 |
Information Analyses | 3 |
Reports - Descriptive | 3 |
Guides - Non-Classroom | 1 |
Numerical/Quantitative Data | 1 |
Opinion Papers | 1 |
Education Level
Higher Education | 47 |
Postsecondary Education | 34 |
Secondary Education | 7 |
Junior High Schools | 4 |
Elementary Education | 3 |
Middle Schools | 3 |
Preschool Education | 3 |
Early Childhood Education | 2 |
Grade 7 | 2 |
Grade 8 | 2 |
Grade 1 | 1 |
More ▼ |
Audience
Researchers | 2 |
Location
Iran | 6 |
Australia | 4 |
Canada | 4 |
China | 3 |
Hong Kong | 3 |
Japan | 3 |
Turkey | 3 |
United Kingdom | 3 |
Europe | 2 |
Georgia | 2 |
Netherlands | 2 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Ilhan, Mustafa – International Journal of Assessment Tools in Education, 2019
This study investigated the effectiveness of statistical adjustments applied to rater bias in many-facet Rasch analysis. Some changes were first made in the dataset that did not include "rater × examinee" bias to cause to have "rater × examinee" bias. Later, bias adjustment was applied to rater bias included in the data file,…
Descriptors: Statistical Analysis, Item Response Theory, Evaluators, Bias
Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018
The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…
Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability
Conger, Anthony J. – Educational and Psychological Measurement, 2017
Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…
Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis
Rettore, Enrico; Rocco, Lorenzo; Dal Maso, Carlo – Education Economics, 2018
We evaluate two reforms that modified the procedures of recruitment and promotion in Italian academia to balance the preeminent role of the recruiting school and to counter nepotism. We theoretically derive the decision rule of the evaluation committees and test it against data including information from all selections to associate and full…
Descriptors: Foreign Countries, College Faculty, Faculty Promotion, Teacher Recruitment
Hicks, Tyler; Rodríguez-Campos, Liliana; Choi, Jeong Hoon – American Journal of Evaluation, 2018
To begin statistical analysis, Bayesians quantify their confidence in modeling hypotheses with priors. A prior describes the probability of a certain modeling hypothesis apart from the data. Bayesians should be able to defend their choice of prior to a skeptical audience. Collaboration between evaluators and stakeholders could make their choices…
Descriptors: Bayesian Statistics, Evaluation Methods, Statistical Analysis, Hypothesis Testing
Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018
The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…
Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators
Lamprianou, Iasonas – Educational and Psychological Measurement, 2018
It is common practice for assessment programs to organize qualifying sessions during which the raters (often known as "markers" or "judges") demonstrate their consistency before operational rating commences. Because of the high-stakes nature of many rating activities, the research community tends to continuously explore new…
Descriptors: Social Networks, Network Analysis, Comparative Analysis, Innovation
Morris, Darrell; Pennell, Ashley M.; Perney, Jan; Trathen, Woodrow – Reading Psychology, 2018
This study compared reading rate to reading fluency (as measured by a rating scale). After listening to first graders read short passages, we assigned an overall fluency rating (low, average, or high) to each reading. We then used predictive discriminant analyses to determine which of five measures--accuracy, rate (objective); accuracy, phrasing,…
Descriptors: Reading Fluency, Prediction, Grade 1, Elementary School Students
Ebuoh, Casmir N. – World Journal of Education, 2018
Literature revealed that the patterns/methods of scoring essay tests had been criticized for not being reliable and this unreliability is more likely to be more in internal examinations than in the external examinations. The purpose of this study is to find out the effects of analytical and holistic scoring patterns on scorer reliability in…
Descriptors: Holistic Approach, Scoring, Essay Tests, Biology
Al-Harthi, Aisha Salim Ali; Campbell, Chris; Karimi, Arafeh – Computers in the Schools, 2018
This study aimed to develop, validate, and trial a rubric for evaluating the cloud-based learning designs (CBLD) that were developed by teachers using virtual learning environments. The rubric was developed using the technological pedagogical content knowledge (TPACK) framework, with rubric development including content and expert validation of…
Descriptors: Computer Assisted Instruction, Scoring Rubrics, Interrater Reliability, Content Validity
Silvey, Brian A.; Wacker, Aaron T.; Felder, Logan – International Journal of Music Education, 2017
The purpose of this study was to investigate the effects of baton usage on college musicians' perceptions of ensemble performance. Two conductors were videotaped while conducting a 1-minute excerpt from either a technical ("Pathfinder of Panama," John Philip Sousa) or lyrical ("Seal Lullaby," Eric Whitacre) piece of concert…
Descriptors: Musicians, College Students, Student Attitudes, Music Activities
Watson, Mary Katherine; Barrella, Elise; Wall, Thomas A.; Noyes, Caroline; Rodgers, Michael – Advances in Engineering Education, 2017
As engineering programs have begun to infuse sustainability into their undergraduate curricula, assessment tools are needed to further inform these reform efforts. The goal of this project was to demonstrate the use of a new rubric to examine students' abilities to engage in sustainable design. The rubric includes 16 sustainable design criteria…
Descriptors: Design, Sustainable Development, Engineering Education, Scoring Rubrics
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators
Raamkumar, Aravind Sesagiri; Foo, Schubert; Pang, Natalie – Information Research: An International Electronic Journal, 2016
Introduction: This paper looks at the issue of inadequate and omitted citations in manuscripts by collecting the experiential opinions of researchers from the dual perspectives of manuscript reviewers and authors. Method: An online survey was conducted with participation from 207 respondents who had experience of reviewing and authoring research…
Descriptors: Citations (References), Publications, Literature Reviews, Writing (Composition)
Azer, Haniyeh Sadeghi; Aghayi, Mohammad Bagher – Advances in Language and Literary Studies, 2015
This study aims to evaluate the translation quality of two machine translation systems in translating six different text-types, from English to Persian. The evaluation was based on criteria proposed by Van Slype (1979). The proposed model for evaluation is a black-box type, comparative and adequacy-oriented evaluation. To conduct the evaluation, a…
Descriptors: Computational Linguistics, Computer Software, Translation, Users (Information)