Publication Date
| In 2026 | 0 |
| Since 2025 | 2 |
| Since 2022 (last 5 years) | 10 |
| Since 2017 (last 10 years) | 20 |
| Since 2007 (last 20 years) | 45 |
Descriptor
| Evaluators | 63 |
| Validity | 63 |
| Reliability | 52 |
| Evaluation Methods | 19 |
| Foreign Countries | 18 |
| Interrater Reliability | 18 |
| Comparative Analysis | 11 |
| Correlation | 11 |
| Rating Scales | 11 |
| Second Language Learning | 10 |
| Scoring Rubrics | 9 |
| More ▼ | |
Source
Author
| Coniam, David | 2 |
| Goe, Laura | 2 |
| Holdheide, Lynn | 2 |
| Miller, Tricia | 2 |
| Abraham, Anne | 1 |
| Akbari, Alireza | 1 |
| Alexander, Regi | 1 |
| Apple, Kristen | 1 |
| Aryadoust, Vahid | 1 |
| Baer, Donald M. | 1 |
| Balzotti, Jon | 1 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 17 |
| Postsecondary Education | 15 |
| Secondary Education | 4 |
| Elementary Education | 3 |
| Elementary Secondary Education | 2 |
| Grade 4 | 2 |
| Grade 6 | 2 |
| Grade 3 | 1 |
| Grade 5 | 1 |
| Grade 7 | 1 |
| Grade 8 | 1 |
| More ▼ | |
Audience
| Researchers | 2 |
| Policymakers | 1 |
Location
| Australia | 4 |
| United Kingdom | 3 |
| Hong Kong | 2 |
| Indonesia | 2 |
| Japan | 2 |
| California | 1 |
| Canada | 1 |
| China | 1 |
| Cyprus | 1 |
| Ecuador | 1 |
| Finland | 1 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 1 |
| Race to the Top | 1 |
Assessments and Surveys
| Test of English as a Foreign… | 2 |
| Behavior Assessment System… | 1 |
| Flesch Kincaid Grade Level… | 1 |
| Strengths and Difficulties… | 1 |
| Systematic Screening for… | 1 |
What Works Clearinghouse Rating
Sunde, Eleah; Briggs, Adam M.; Mitteer, Daniel R. – Journal of Applied Behavior Analysis, 2022
Prior research has evaluated the reliability and validity of structured visual inspection (SVI) criteria for interpreting functional analysis (FA) outcomes (Hagopian et al., 1997; Roane et al., 2013). We adapted these criteria to meet the unique needs of interpreting latency-based FA outcomes and examined the reliability and validity of applying…
Descriptors: Reliability, Validity, Visual Perception, Evaluation Criteria
Denis Dumas; James C. Kaufman – Educational Psychology Review, 2024
Who should evaluate the originality and task-appropriateness of a given idea has been a perennial debate among psychologists of creativity. Here, we argue that the most relevant evaluator of a given idea depends crucially on the level of expertise of the person who generated it. To build this argument, we draw on two complimentary theoretical…
Descriptors: Decision Making, Creativity, Task Analysis, Psychologists
Nofrida, Eka R.; PH, Slamet; Prasojo, Lantip D.; Mahmudah, Fitri N. – Pegem Journal of Education and Instruction, 2022
This study aims to (1) produce an instrument for measuring college student entrepreneurial skills; (2) describe the quality of the measurement instrument for college students' entrepreneurial skills; (3) describe the practicality of the measurement instrument for college students' entrepreneurial skills. The method used in this study is the…
Descriptors: Measures (Individuals), College Students, Validity, Reliability
Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025
This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics
Gill, Tim – Research Matters, 2022
In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…
Descriptors: Comparative Analysis, Decision Making, Scripts, Standards
Walland, Emma – Research Matters, 2022
In this article, I report on examiners' views and experiences of using Pairwise Comparative Judgement (PCJ) and Rank Ordering (RO) as alternatives to traditional analytical marking for GCSE English Language essays. Fifteen GCSE English Language examiners took part in the study. After each had judged 100 pairs of essays using PCJ and eight packs of…
Descriptors: Essays, Grading, Writing Evaluation, Evaluators
Yuichiro Yokouchi – Language Testing in Asia, 2025
The performance decision tree (PDT; Fulcher et al., 2011) is a rubric style that is applicable to performance assessment, with origins in Upshur and Turner's (1995) empirically derived binary-choice, boundary-definition (EBB) scale. It is easier for raters to assess performance by evaluating multiple binary-choice descriptors. Additionally,…
Descriptors: Scoring Rubrics, Second Language Learning, Second Language Instruction, Language Teachers
Pinargote-Ortega, Maricela; Bowen-Mendoza, Lorena; Meza, Jaime; Ventura, Sebastián – Journal of Computing in Higher Education, 2021
In this paper, we applied a peer assessment scenario at the Technical University of Manabí (Ecuador). Students and professors evaluated some works through rubrics, assigned a numerical score, and provided textual feedback grounding why such a numerical score was determined, to detect inaccuracy between both assessments. The proposed model uses…
Descriptors: Foreign Countries, College Students, Peer Evaluation, Scoring Rubrics
Kevin Ward – ProQuest LLC, 2022
The study established the validity and reliability of a weighted individual performance-based assessment tool within the utility scope of middle school orchestral strings. The following research questions guided this study: 1. What specific string-playing behaviors and corresponding criteria validate a weighted individual performance-based…
Descriptors: Music Education, Musical Instruments, Psychometrics, Music
Wikse Barrow, Carla; Nilsson Bjorkenstam, Kristina; Strombergsson, Sofia – Journal of Child Language, 2019
This study aimed to investigate concerns of validity and reliability in subjective ratings of age-of-acquisition (AoA), through exploring characteristics of the individual rater. An additional aim was to validate the obtained AoA ratings against two corpora -- one of child speech and one of adult speech -- specifically exploring whether words…
Descriptors: Language Acquisition, Evaluators, Validity, Reliability
Akbari, Alireza; Shahnazari, Mohammadtaghi – Language Testing in Asia, 2019
The present research paper introduces a translation evaluation method called Calibrated Parsing Items Evaluation (CPIE hereafter). This evaluation method maximizes translators' performance through identifying the parsing items with an optimal p-docimology and d-index (item discrimination). This method checks all the possible parses (annotations)…
Descriptors: Test Items, Translation, Computer Software, Evaluators
Jansen, Amanda; Smith, Ethan P.; Middleton, James A.; Cullicott, Catherine E. – North American Chapter of the International Group for the Psychology of Mathematics Education, 2021
The purpose of this report is to present our process and results for establishing validity and reliability of an observation tool used to investigate teaching practices that high school mathematics teachers use to engage students. We developed our tool using established practices, such as reviewing literature to develop a framework for instruction…
Descriptors: Mathematics Instruction, Secondary School Teachers, High School Teachers, Scoring Rubrics
Dalton, Sarah Grace; Stark, Brielle C.; Fromm, Davida; Apple, Kristen; MacWhinney, Brian; Rensch, Amanda; Rowedder, Madyson – Journal of Speech, Language, and Hearing Research, 2022
Purpose: The aim of this study was to advance the use of structured, monologic discourse analysis by validating an automated scoring procedure for core lexicon (CoreLex) using transcripts. Method: Forty-nine transcripts from persons with aphasia and 48 transcripts from persons with no brain injury were retrieved from the AphasiaBank database. Five…
Descriptors: Validity, Discourse Analysis, Databases, Scoring
Zhang, Xiuyuan – AERA Online Paper Repository, 2019
The main purpose of the study is to evaluate the qualities of human essay ratings for a large-scale assessment using Rasch measurement theory. Specifically, Many-Facet Rasch Measurement (MFRM) was utilized to examine the rating scale category structure and provide important information about interpretations of ratings in the large-scale…
Descriptors: Essays, Evaluators, Writing Evaluation, Reliability
Ghanbari, Nasim; Barati, Hossein – Language Testing in Asia, 2020
The present study reports the process of development and validation of a rating scale in the Iranian EFL academic writing assessment context. To achieve this goal, the study was conducted in three distinct phases. Early in the study, the researcher interviewed a number of raters in different universities. Next, a questionnaire was developed based…
Descriptors: Rating Scales, Writing Evaluation, English for Academic Purposes, Second Language Learning

Peer reviewed
Direct link
