Publication Date
| In 2026 | 0 |
| Since 2025 | 3 |
| Since 2022 (last 5 years) | 31 |
| Since 2017 (last 10 years) | 60 |
| Since 2007 (last 20 years) | 101 |
Descriptor
| Evaluators | 142 |
| Reliability | 142 |
| Validity | 52 |
| Evaluation Methods | 41 |
| Foreign Countries | 32 |
| Comparative Analysis | 30 |
| Scores | 23 |
| Scoring | 23 |
| Correlation | 22 |
| Interrater Reliability | 21 |
| Second Language Learning | 19 |
| More ▼ | |
Source
Author
| Chambers, Lucy | 2 |
| Everson, Mark D. | 2 |
| Goe, Laura | 2 |
| Holdheide, Lynn | 2 |
| Impara, James C. | 2 |
| Kugle, C. L. | 2 |
| Lin, Chih-Kai | 2 |
| Miller, Tricia | 2 |
| Myford, Carol M. | 2 |
| Plake, Barbara S. | 2 |
| Sandoval, Jose Miguel | 2 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 4 |
| Policymakers | 1 |
Location
| Florida | 4 |
| Turkey | 4 |
| Australia | 3 |
| China | 3 |
| United Kingdom (England) | 3 |
| Illinois | 2 |
| Indonesia | 2 |
| Japan | 2 |
| United Kingdom | 2 |
| Canada | 1 |
| Cayman Islands | 1 |
| More ▼ | |
Laws, Policies, & Programs
| Race to the Top | 2 |
| No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Brogan L. Barr; Virginia V. W. McIntosh; Eileen F. Britt; Jennifer Jordan; Janet D. Carter – Measurement: Interdisciplinary Research and Perspectives, 2024
Even when raters demonstrate agreement in the use of a measure, limited score variability or violation of often-ignored statistical assumptions can result in lower reliability estimates than intuitively expected. This article uses data drawn from two randomized controlled trials of schema therapy and cognitive behavioral therapy for the treatment…
Descriptors: Evaluators, Interrater Reliability, Reliability, Measurement Techniques
Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024
The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…
Descriptors: Accuracy, Reliability, Computational Linguistics, Standards
Ngoc My Bui; Jessie S. Barrot – Education and Information Technologies, 2025
With the generative artificial intelligence (AI) tool's remarkable capabilities in understanding and generating meaningful content, intriguing questions have been raised about its potential as an automated essay scoring (AES) system. One such tool is ChatGPT, which is capable of scoring any written work based on predefined criteria. However,…
Descriptors: Artificial Intelligence, Natural Language Processing, Technology Uses in Education, Automation
Robert H. Eaglen; Steven J. Durning; Holly S. Meyer; Christopher S. Candler – Quality in Higher Education, 2024
Higher education accreditation has spread internationally as a vehicle for quality assurance and improvement but is strongly influenced by accreditation practices in the United States. The organisational structure and processes of seven United States health professions accreditors were analysed to identify common characteristics that reflect…
Descriptors: Accreditation (Institutions), Quality Assurance, Evaluators, Evaluation Methods
Li, Wentao – Reading and Writing: An Interdisciplinary Journal, 2022
Scoring rubrics are known to be effective for assessing writing for both testing and classroom teaching purposes. How raters interpret the descriptors in a rubric can significantly impact the subsequent final score, and further, the descriptors may also color a rater's judgment of a student's writing quality. Little is known, however, about how…
Descriptors: Scoring Rubrics, Interrater Reliability, Writing Evaluation, Teaching Methods
Da Fonte, M. Alexandra; Wolfe, Nicole P.; DeLuca, Emily R.; Cavagnini, Melissa J.; Nardi, Krista L. – Journal of Special Education Technology, 2023
Mobile technologies, including apps, have become increasingly popular, and are being used to support daily activities among a variety of individuals. While the use of mobile technologies will not eliminate barriers often faced by individuals with disabilities, these systems have the potential to help minimize some of these barriers. As the…
Descriptors: Handheld Devices, Computer Oriented Programs, Evaluation, Disabilities
Yuan Tian; Xi Yang; Suhail A. Doi; Luis Furuya-Kanamori; Lifeng Lin; Joey S. W. Kwong; Chang Xu – Research Synthesis Methods, 2024
RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two…
Descriptors: Risk, Randomized Controlled Trials, Classification, Robotics
Qusai Khraisha; Sophie Put; Johanna Kappenberg; Azza Warraitch; Kristin Hadfield – Research Synthesis Methods, 2024
Systematic reviews are vital for guiding practice, research and policy, although they are often slow and labour-intensive. Large language models (LLMs) could speed up and automate systematic reviews, but their performance in such tasks has yet to be comprehensively evaluated against humans, and no study has tested Generative Pre-Trained…
Descriptors: Peer Evaluation, Research Reports, Artificial Intelligence, Computer Software
Darvishi, Ali; Khosravi, Hassan; Rahimi, Afshin; Sadiq, Shazia; Gasevic, Dragan – IEEE Transactions on Learning Technologies, 2023
Engaging students in creating learning resources has demonstrated pedagogical benefits. However, to effectively utilize a repository of student-generated content (SGC), a selection process is needed to separate high- from low-quality resources as some of the resources created by students can be ineffective, inappropriate, or incorrect. A common…
Descriptors: Student Developed Materials, Educational Assessment, Peer Evaluation, Evaluation Methods
Park, Yeonggwang; Cádiz, Manuel Díaz; Nagle, Kathleen F.; Stepp, Cara E. – Journal of Speech, Language, and Hearing Research, 2020
Purpose: Assessment of strained voice quality is difficult due to the weak reliability of auditory-perceptual evaluation and lack of strong acoustic correlates. This study evaluated the contributions of relative fundamental frequency (RFF) and mid-to-high frequency noise to the perception of strain. Method: Stimuli were created using recordings of…
Descriptors: Acoustics, Audio Equipment, Auditory Perception, Correlation
Sunde, Eleah; Briggs, Adam M.; Mitteer, Daniel R. – Journal of Applied Behavior Analysis, 2022
Prior research has evaluated the reliability and validity of structured visual inspection (SVI) criteria for interpreting functional analysis (FA) outcomes (Hagopian et al., 1997; Roane et al., 2013). We adapted these criteria to meet the unique needs of interpreting latency-based FA outcomes and examined the reliability and validity of applying…
Descriptors: Reliability, Validity, Visual Perception, Evaluation Criteria
Huiying Cai; Xun Yan – Language Testing, 2024
Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…
Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation
Rossin, Emily G.; Bergee, Martin J. – Journal of Research in Music Education, 2021
This is the sixth and culminating study in a series whose purpose has been to acquire a conceptual understanding of school band performance and to develop an assessment based on this understanding. With the present study, we cross-validated and applied a rating scale for school band performance. In the cross-validation phase, college students…
Descriptors: Music Education, Music Activities, Music, Performance
Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024
Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…
Descriptors: Semantics, Educational Assessment, Evaluators, Reliability
Denis Dumas; James C. Kaufman – Educational Psychology Review, 2024
Who should evaluate the originality and task-appropriateness of a given idea has been a perennial debate among psychologists of creativity. Here, we argue that the most relevant evaluator of a given idea depends crucially on the level of expertise of the person who generated it. To build this argument, we draw on two complimentary theoretical…
Descriptors: Decision Making, Creativity, Task Analysis, Psychologists

Peer reviewed
Direct link
