ERIC - Search Results

Publication Date

In 2026	0
Since 2025	3
Since 2022 (last 5 years)	31
Since 2017 (last 10 years)	60
Since 2007 (last 20 years)	101

Descriptor

Evaluators	142
Reliability	142
Validity	52
Evaluation Methods	41
Foreign Countries	32
Comparative Analysis	30
Scores	23
Scoring	23
Correlation	22
Interrater Reliability	21
Second Language Learning	19
Evaluation Criteria	18
Rating Scales	18
Writing Evaluation	18
English (Second Language)	17
Decision Making	16
Higher Education	16
Teacher Evaluation	16
Scoring Rubrics	15
Standards	13
Teaching Methods	13
Classroom Observation…	12
Computer Software	12
Language Tests	12
Second Language Instruction	12
More ▼

Publication Type

Reports - Research	103
Journal Articles	100
Reports - Evaluative	16
Speeches/Meeting Papers	15
Dissertations/Theses -…	9
Tests/Questionnaires	9
Reports - Descriptive	8
Guides - Non-Classroom	4
Information Analyses	3
Guides - General	1
Numerical/Quantitative Data	1
Opinion Papers	1
More ▼

Education Level

Higher Education	30
Postsecondary Education	27
Secondary Education	8
Elementary Secondary Education	6
Elementary Education	4
Middle Schools	3
Early Childhood Education	2
High Schools	2
Junior High Schools	2
Kindergarten	2
Grade 4	1
Grade 6	1
Intermediate Grades	1
Preschool Education	1
Primary Education	1
More ▼

Audience

Researchers	4
Policymakers	1

Location

Florida	4
Turkey	4
Australia	3
China	3
United Kingdom (England)	3
Illinois	2
Indonesia	2
Japan	2
United Kingdom	2
Canada	1
Cayman Islands	1
Colombia	1
Cyprus	1
Ecuador	1
Europe	1
Finland	1
Hong Kong	1
Idaho	1
India	1
Iran	1
Louisiana	1
Maryland	1
Massachusetts	1
Minnesota	1
New Zealand	1
More ▼

Laws, Policies, & Programs

Race to the Top	2
No Child Left Behind Act 2001	1

Assessments and Surveys

Flanders System of…	4
International English…	2
Test of English as a Foreign…	2
Behavior Assessment System…	1
Flesch Kincaid Grade Level…	1
Minnesota Teacher Attitude…	1
Strengths and Difficulties…	1
Systematic Screening for…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 142 results Save | Export

Visualizing Agreement: Bland-Altman Plots as a Supplement to Inter-Rater Reliability Indices

Peer reviewed

Direct link

Brogan L. Barr; Virginia V. W. McIntosh; Eileen F. Britt; Jennifer Jordan; Janet D. Carter – Measurement: Interdisciplinary Research and Perspectives, 2024

Even when raters demonstrate agreement in the use of a measure, limited score variability or violation of often-ignored statistical assumptions can result in lower reliability estimates than intuitively expected. This article uses data drawn from two randomized controlled trials of schema therapy and cognitive behavioral therapy for the treatment…

Descriptors: Evaluators, Interrater Reliability, Reliability, Measurement Techniques

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

ChatGPT as an Automated Essay Scoring Tool in the Writing Classrooms: How It Compares with Human Scoring

Peer reviewed

Direct link

Ngoc My Bui; Jessie S. Barrot – Education and Information Technologies, 2025

With the generative artificial intelligence (AI) tool's remarkable capabilities in understanding and generating meaningful content, intriguing questions have been raised about its potential as an automated essay scoring (AES) system. One such tool is ChatGPT, which is capable of scoring any written work based on predefined criteria. However,…

Descriptors: Artificial Intelligence, Natural Language Processing, Technology Uses in Education, Automation

Evaluating the Evaluators: Analysis of the Structure and Processes of Seven United States Health Professions Education Accreditors

Peer reviewed

Direct link

Robert H. Eaglen; Steven J. Durning; Holly S. Meyer; Christopher S. Candler – Quality in Higher Education, 2024

Higher education accreditation has spread internationally as a vehicle for quality assurance and improvement but is strongly influenced by accreditation practices in the United States. The organisational structure and processes of seven United States health professions accreditors were analysed to identify common characteristics that reflect…

Descriptors: Accreditation (Institutions), Quality Assurance, Evaluators, Evaluation Methods

Scoring Rubric Reliability and Internal Validity in Rater-Mediated EFL Writing Assessment: Insights from Many-Facet Rasch Measurement

Peer reviewed

Direct link

Li, Wentao – Reading and Writing: An Interdisciplinary Journal, 2022

Scoring rubrics are known to be effective for assessing writing for both testing and classroom teaching purposes. How raters interpret the descriptors in a rubric can significantly impact the subsequent final score, and further, the descriptors may also color a rater's judgment of a student's writing quality. Little is known, however, about how…

Descriptors: Scoring Rubrics, Interrater Reliability, Writing Evaluation, Teaching Methods

An App Evaluation System for All Stakeholders: A Pilot Study

Peer reviewed

Direct link

Da Fonte, M. Alexandra; Wolfe, Nicole P.; DeLuca, Emily R.; Cavagnini, Melissa J.; Nardi, Krista L. – Journal of Special Education Technology, 2023

Mobile technologies, including apps, have become increasingly popular, and are being used to support daily activities among a variety of individuals. While the use of mobile technologies will not eliminate barriers often faced by individuals with disabilities, these systems have the potential to help minimize some of these barriers. As the…

Descriptors: Handheld Devices, Computer Oriented Programs, Evaluation, Disabilities

Towards the Automatic Risk of Bias Assessment on Randomized Controlled Trials: A Comparison of RobotReviewer and Humans

Peer reviewed

Direct link

Yuan Tian; Xi Yang; Suhail A. Doi; Luis Furuya-Kanamori; Lifeng Lin; Joey S. W. Kwong; Chang Xu – Research Synthesis Methods, 2024

RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two…

Descriptors: Risk, Randomized Controlled Trials, Classification, Robotics

Can Large Language Models Replace Humans in Systematic Reviews? Evaluating GPT-4's Efficacy in Screening and Extracting Data from Peer-Reviewed and Grey Literature in Multiple Languages

Peer reviewed

Direct link

Qusai Khraisha; Sophie Put; Johanna Kappenberg; Azza Warraitch; Kristin Hadfield – Research Synthesis Methods, 2024

Systematic reviews are vital for guiding practice, research and policy, although they are often slow and labour-intensive. Large language models (LLMs) could speed up and automate systematic reviews, but their performance in such tasks has yet to be comprehensively evaluated against humans, and no study has tested Generative Pre-Trained…

Descriptors: Peer Evaluation, Research Reports, Artificial Intelligence, Computer Software

Assessing the Quality of Student-Generated Content at Scale: A Comparative Analysis of Peer-Review Models

Peer reviewed

Direct link

Darvishi, Ali; Khosravi, Hassan; Rahimi, Afshin; Sadiq, Shazia; Gasevic, Dragan – IEEE Transactions on Learning Technologies, 2023

Engaging students in creating learning resources has demonstrated pedagogical benefits. However, to effectively utilize a repository of student-generated content (SGC), a selection process is needed to separate high- from low-quality resources as some of the resources created by students can be ineffective, inappropriate, or incorrect. A common…

Descriptors: Student Developed Materials, Educational Assessment, Peer Evaluation, Evaluation Methods

Perceptual and Acoustic Assessment of Strain Using Synthetically Modified Voice Samples

Peer reviewed

Direct link

Park, Yeonggwang; Cádiz, Manuel Díaz; Nagle, Kathleen F.; Stepp, Cara E. – Journal of Speech, Language, and Hearing Research, 2020

Purpose: Assessment of strained voice quality is difficult due to the weak reliability of auditory-perceptual evaluation and lack of strong acoustic correlates. This study evaluated the contributions of relative fundamental frequency (RFF) and mid-to-high frequency noise to the perception of strain. Method: Stimuli were created using recordings of…

Descriptors: Acoustics, Audio Equipment, Auditory Perception, Correlation

Reliability and Validity of Using Structured Visual-Inspection Criteria to Interpret Latency-Based Functional Analysis Outcomes

Peer reviewed

Direct link

Sunde, Eleah; Briggs, Adam M.; Mitteer, Daniel R. – Journal of Applied Behavior Analysis, 2022

Prior research has evaluated the reliability and validity of structured visual inspection (SVI) criteria for interpreting functional analysis (FA) outcomes (Hagopian et al., 1997; Roane et al., 2013). We adapted these criteria to meet the unique needs of interpreting latency-based FA outcomes and examined the reliability and validity of applying…

Descriptors: Reliability, Validity, Visual Perception, Evaluation Criteria

Triangulating Natural Language Processing (NLP)-Based Analysis of Rater Comments and Many-Facet Rasch Measurement (MFRM): An Innovative Approach to Investigating Raters' Application of Rating Scales in Writing Assessment

Peer reviewed

Direct link

Huiying Cai; Xun Yan – Language Testing, 2024

Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…

Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation

Cross-Validation and Application of a Scale Assessing School Band Performance

Peer reviewed

Direct link

Rossin, Emily G.; Bergee, Martin J. – Journal of Research in Music Education, 2021

This is the sixth and culminating study in a series whose purpose has been to acquire a conceptual understanding of school band performance and to develop an assessment based on this understanding. With the present study, we cross-validated and applied a rating scale for school band performance. In the cross-validation phase, college students…

Descriptors: Music Education, Music Activities, Music, Performance

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement

Peer reviewed

Direct link

Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024

Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…

Descriptors: Semantics, Educational Assessment, Evaluators, Reliability

Evaluation Is Creation: Self and Social Judgments of Creativity across the Four-C Model

Peer reviewed

Direct link

Denis Dumas; James C. Kaufman – Educational Psychology Review, 2024

Who should evaluate the originality and task-appropriateness of a given idea has been a perennial debate among psychologists of creativity. Here, we argue that the most relevant evaluator of a given idea depends crucially on the level of expertise of the person who generated it. To build this argument, we draw on two complimentary theoretical…

Descriptors: Decision Making, Creativity, Task Analysis, Psychologists

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10

ProQuest LLC	9
Language Testing	5
Language Testing in Asia	5
Journal of Speech, Language,…	4
Research Matters	4
Applied Measurement in…	3
Advances in Health Sciences…	2
British Journal of…	2
Educational Assessment	2
International Educational…	2
International Journal of…	2
Journal of Clinical Child and…	2
Journal of Research in Music…	2
Language Learning	2
Language Teaching Research…	2
National Comprehensive Center…	2
New Directions for Evaluation	2
Research Synthesis Methods	2
Studies in Educational…	2
AERA Online Paper Repository	1
Advances in Physiology…	1
American Journal of Evaluation	1
Assessing Writing	1
Behavioral Disorders	1
CALICO Journal	1
More ▼

Chambers, Lucy	2
Everson, Mark D.	2
Goe, Laura	2
Holdheide, Lynn	2
Impara, James C.	2
Kugle, C. L.	2
Lin, Chih-Kai	2
Miller, Tricia	2
Myford, Carol M.	2
Plake, Barbara S.	2
Sandoval, Jose Miguel	2
Sata, Mehmet	2
Zhang, Xiuyuan	2
Abdul Gafoor, K.	1
Akbari, Alireza	1
Allan S. Cohen	1
Allely, C. S.	1
Amanda Huee-Ping Wong	1
Anwyll, Steve	1
Apple, Kristen	1
Aryadoust, Vahid	1
Azza Warraitch	1
Baba, Yukino	1
Baer, Donald M.	1
More ▼