Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 22 |
| Since 2017 (last 10 years) | 37 |
| Since 2007 (last 20 years) | 56 |
Descriptor
| Evaluation Methods | 123 |
| Evaluators | 123 |
| Interrater Reliability | 78 |
| Reliability | 41 |
| Comparative Analysis | 26 |
| Evaluation Criteria | 23 |
| Higher Education | 22 |
| Foreign Countries | 20 |
| Performance Based Assessment | 19 |
| Scoring | 19 |
| Validity | 19 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 4 |
| Researchers | 4 |
| Teachers | 2 |
Location
| Australia | 3 |
| Israel | 3 |
| United Kingdom (England) | 3 |
| China | 2 |
| Spain | 2 |
| Tennessee | 2 |
| United Kingdom | 2 |
| Vietnam | 2 |
| Canada | 1 |
| Cuba | 1 |
| Europe | 1 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 1 |
| Race to the Top | 1 |
Assessments and Surveys
| National Assessment of… | 1 |
| Praxis Series | 1 |
| Stanford Achievement Tests | 1 |
| Test of English as a Foreign… | 1 |
| Torrance Tests of Creative… | 1 |
What Works Clearinghouse Rating
Janice Kinghorn; Katherine McGuire; Bethany L. Miller; Aaron Zimmerman – Assessment Update, 2024
In this article, the authors share their reflections on how different experiences and paradigms have broadened their understanding of the work of assessment in higher education. As they collaborated to create a panel for the 2024 International Conference on Assessing Quality in Higher Education, they recognized that they, as assessment…
Descriptors: Higher Education, Assessment Literacy, Evaluation Criteria, Evaluation Methods
Robert H. Eaglen; Steven J. Durning; Holly S. Meyer; Christopher S. Candler – Quality in Higher Education, 2024
Higher education accreditation has spread internationally as a vehicle for quality assurance and improvement but is strongly influenced by accreditation practices in the United States. The organisational structure and processes of seven United States health professions accreditors were analysed to identify common characteristics that reflect…
Descriptors: Accreditation (Institutions), Quality Assurance, Evaluators, Evaluation Methods
Yuan Tian; Xi Yang; Suhail A. Doi; Luis Furuya-Kanamori; Lifeng Lin; Joey S. W. Kwong; Chang Xu – Research Synthesis Methods, 2024
RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two…
Descriptors: Risk, Randomized Controlled Trials, Classification, Robotics
Darvishi, Ali; Khosravi, Hassan; Rahimi, Afshin; Sadiq, Shazia; Gasevic, Dragan – IEEE Transactions on Learning Technologies, 2023
Engaging students in creating learning resources has demonstrated pedagogical benefits. However, to effectively utilize a repository of student-generated content (SGC), a selection process is needed to separate high- from low-quality resources as some of the resources created by students can be ineffective, inappropriate, or incorrect. A common…
Descriptors: Student Developed Materials, Educational Assessment, Peer Evaluation, Evaluation Methods
Bijani, Houman; Hashempour, Bahareh; Ibrahim, Khaled Ahmed Abdel-Al; Orabah, Salim Said Bani; Heydarnejad, Tahereh – Language Testing in Asia, 2022
Due to subjectivity in oral assessment, much concentration has been put on obtaining a satisfactory measure of consistency among raters. However, the process for obtaining more consistency might not result in valid decisions. One matter that is at the core of both reliability and validity in oral assessment is rater training. Recently,…
Descriptors: Oral Language, Language Tests, Feedback (Response), Bias
Ilona Rinne – Assessment & Evaluation in Higher Education, 2024
It is widely acknowledged in research that common criteria and aligned standards do not result in consistent assessment of such a complex performance as the final undergraduate thesis. Assessment is determined by examiners' understanding of rubrics and their views on thesis quality. There is still a gap in the research literature about how…
Descriptors: Foreign Countries, Undergraduate Students, Teacher Education Programs, Evaluation Criteria
Ping-Lin Chuang – Language Testing, 2025
This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…
Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources
Huiying Cai; Xun Yan – Language Testing, 2024
Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…
Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation
Chao Han; Binghan Zheng; Mingqing Xie; Shirong Chen – Interpreter and Translator Trainer, 2024
Human raters' assessment of interpreting is a complex process. Previous researchers have mainly relied on verbal reports to examine this process. To advance our understanding, we conducted an empirical study, collecting raters' eye-movement and retrospection data in a computerised interpreting assessment in which three groups of raters (n = 35)…
Descriptors: Foreign Countries, College Students, College Graduates, Interrater Reliability
Rossin, Emily G.; Bergee, Martin J. – Journal of Research in Music Education, 2021
This is the sixth and culminating study in a series whose purpose has been to acquire a conceptual understanding of school band performance and to develop an assessment based on this understanding. With the present study, we cross-validated and applied a rating scale for school band performance. In the cross-validation phase, college students…
Descriptors: Music Education, Music Activities, Music, Performance
Cristina Menescardi; Aida Carballo-Fazanes; Núria Ortega-Benavent; Isaac Estevan – Journal of Motor Learning and Development, 2024
The Canadian Agility and Movement Skill Assessment (CAMSA) is a valid and reliable circuit-based test of motor competence which can be used to assess children's skills in a live or recorded performance and then coded. We aimed to analyze the intrarater reliability of the CAMSA scores (total, time, and skill score) and time measured, by comparing…
Descriptors: Interrater Reliability, Evaluators, Scoring, Psychomotor Skills
Zahn, Daniela; Canton, Ursula; Boyd, Victoria; Hamilton, Laura; Mamo, Josianne; McKay, Jane; Proudfoot, Linda; Telfer, Dickson; Williams, Kim; Wilson, Colin – Studies in Higher Education, 2021
Evaluating the impact of Academic Literacies teaching (Lea and Street [1998. "Student Writing in Higher Education: An Academic Literacies Approach." "Studies in Higher Education" 23 (2): 157-72. doi:10.1080/03075079812331380364]) is difficult, as it involves gauging whether writers: (1) gain better understanding of what…
Descriptors: Writing Evaluation, Evaluation Methods, Undergraduate Students, Foreign Countries
Hunter, Seth B. – Journal of Education Human Resources, 2023
Teacher performance scores inform education leaders' management of teacher human resources. However, prior research has implied that different interpretations of performance criteria between teachers and their evaluators suppress teacher development. Although research has examined teacher perceptions of performance scores and compared teacher…
Descriptors: Teacher Evaluation, Teacher Effectiveness, Self Evaluation (Individuals), Interrater Reliability
Gill, Tim – Research Matters, 2022
In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…
Descriptors: Comparative Analysis, Decision Making, Scripts, Standards
Walland, Emma – Research Matters, 2022
In this article, I report on examiners' views and experiences of using Pairwise Comparative Judgement (PCJ) and Rank Ordering (RO) as alternatives to traditional analytical marking for GCSE English Language essays. Fifteen GCSE English Language examiners took part in the study. After each had judged 100 pairs of essays using PCJ and eight packs of…
Descriptors: Essays, Grading, Writing Evaluation, Evaluators

Peer reviewed
Direct link
