ERIC - Search Results

Publication Date

In 2025	0
Since 2024	3
Since 2021 (last 5 years)	9
Since 2016 (last 10 years)	15
Since 2006 (last 20 years)	24

Descriptor

Evaluation Methods	41
Evaluators	41
Scoring	41
Interrater Reliability	16
Foreign Countries	10
Second Language Learning	8
Computer Software	7
English (Second Language)	7
Language Tests	7
Performance Based Assessment	7
Comparative Analysis	6
Computer Assisted Testing	6
Educational Assessment	6
Higher Education	6
Scores	6
Writing Evaluation	6
Accuracy	5
Correlation	5
Essays	5
Rating Scales	5
Teacher Evaluation	5
Elementary School Students	4
Portfolios (Background…	4
Program Effectiveness	4
Reliability	4
More ▼

Publication Type

Reports - Research	32
Journal Articles	28
Reports - Evaluative	6
Speeches/Meeting Papers	6
Tests/Questionnaires	4
Reports - Descriptive	2
ERIC Digests in Full Text	1
ERIC Publications	1
Guides - Non-Classroom	1
Information Analyses	1
Opinion Papers	1
Reference Materials -…	1
More ▼

Education Level

Higher Education	6
Postsecondary Education	6
Elementary Education	3
Early Childhood Education	2
Elementary Secondary Education	2
Primary Education	2
Grade 1	1
Grade 2	1
Grade 4	1
Grade 6	1
High Schools	1
Intermediate Grades	1
Kindergarten	1
Middle Schools	1
Secondary Education	1
More ▼

Audience

Location

China	3
Algeria	1
California (Los Angeles)	1
Europe	1
Netherlands	1
Nigeria	1
Spain	1
Texas	1
United Kingdom	1
United States	1
Vietnam	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	3
National Assessment of…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 41 results Save | Export

Effect of Immediate Elaborated Feedback on Rater Accuracy. Research Report. ETS RR-20-09

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal – ETS Research Report Series, 2020

Principles of skill acquisition dictate that raters should be provided with frequent feedback about their ratings. However, in current operational practice, raters rarely receive immediate feedback about their scores owing to the prohibitive effort required to generate such feedback. An approach for generating and administering feedback responses…

Descriptors: Feedback (Response), Evaluators, Accuracy, Scores

Exploring an Alternative to Record Motor Competence Assessment: Interrater and Intrarater Audio-Video Reliability

Peer reviewed

Direct link

Cristina Menescardi; Aida Carballo-Fazanes; Núria Ortega-Benavent; Isaac Estevan – Journal of Motor Learning and Development, 2024

The Canadian Agility and Movement Skill Assessment (CAMSA) is a valid and reliable circuit-based test of motor competence which can be used to assess children's skills in a live or recorded performance and then coded. We aimed to analyze the intrarater reliability of the CAMSA scores (total, time, and skill score) and time measured, by comparing…

Descriptors: Interrater Reliability, Evaluators, Scoring, Psychomotor Skills

Examining the Effect of Assessment Construct Characteristics on Machine Learning Scoring of Scientific Argumentation

Peer reviewed

Direct link

Kevin C. Haudek; Xiaoming Zhai – International Journal of Artificial Intelligence in Education, 2024

Argumentation, a key scientific practice presented in the "Framework for K-12 Science Education," requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging…

Descriptors: Accuracy, Persuasive Discourse, Artificial Intelligence, Learning Management Systems

Beyond Agreement: Exploring Rater Effects in Large-Scale Mixed Format Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Guo, Wenjing – Educational Assessment, 2021

Scoring procedures for the constructed-response (CR) items in large-scale mixed-format educational assessments often involve checks for rater agreement or rater reliability. Although these analyses are important, researchers have documented rater effects that persist despite rater training and that are not always detected in rater agreement and…

Descriptors: Scoring, Responses, Test Items, Test Format

Combining Human and Automated Scoring Methods in Experimental Assessments of Writing: A Case Study Tutorial

Peer reviewed

Direct link

Reagan Mozer; Luke Miratrix; Jackie Eunjung Relyea; James S. Kim – Journal of Educational and Behavioral Statistics, 2024

In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This…

Descriptors: Scoring, Evaluation Methods, Writing Evaluation, Comparative Analysis

Towards More Valid Scoring Criteria for Integrated Reading-Writing and Listening-Writing Summary Tasks

Peer reviewed

Direct link

Chan, Sathena; May, Lyn – Language Testing, 2023

Despite the increased use of integrated tasks in high-stakes academic writing assessment, research on rating criteria which reflect the unique construct of integrated summary writing skills is comparatively rare. Using a mixed-method approach of expert judgement, text analysis, and statistical analysis, this study examines writing features that…

Descriptors: Scoring, Writing Evaluation, Reading Tests, Listening Skills

Validation of an Automated Procedure for Calculating Core Lexicon from Transcripts

Peer reviewed

Direct link

Dalton, Sarah Grace; Stark, Brielle C.; Fromm, Davida; Apple, Kristen; MacWhinney, Brian; Rensch, Amanda; Rowedder, Madyson – Journal of Speech, Language, and Hearing Research, 2022

Purpose: The aim of this study was to advance the use of structured, monologic discourse analysis by validating an automated scoring procedure for core lexicon (CoreLex) using transcripts. Method: Forty-nine transcripts from persons with aphasia and 48 transcripts from persons with no brain injury were retrieved from the AphasiaBank database. Five…

Descriptors: Validity, Discourse Analysis, Databases, Scoring

Can Automated Machine Translation Evaluation Metrics Be Used to Assess Students' Interpretation in the Language Learning Classroom?

Peer reviewed

Direct link

Han, Chao; Lu, Xiaolei – Computer Assisted Language Learning, 2023

The use of translation and interpreting (T&I) in the language learning classroom is commonplace, serving various pedagogical and assessment purposes. Previous utilization of T&I exercises is driven largely by their potential to enhance language learning, whereas the latest trend has begun to underscore T&I as a crucial skill to be…

Descriptors: Translation, Computational Linguistics, Correlation, Language Processing

Effects of Second Language Pronunciation Teaching Revisited: A Proposed Measurement Framework and Meta-Analysis

Peer reviewed

Direct link

Saito, Kazuya; Plonsky, Luke – Language Learning, 2019

We propose a new framework for conceptualizing measures of instructed second language (L2) pronunciation performance according to three sets of parameters: (a) the constructs (focused on global vs. specific aspects of pronunciation), (b) the scoring method (human raters vs. acoustic analyses), and (c) the type of knowledge elicited (controlled vs.…

Descriptors: Second Language Learning, Second Language Instruction, Scoring, Pronunciation Instruction

The Processes of Rating L2 Speaking Performance Using an Analytic Rating Scale -- A Qualitative Exploration

Peer reviewed
PDF on ERIC

Download full text

Thai, Thuy; Sheehan, Susan – Language Education & Assessment, 2022

In language performance tests, raters are important as their scoring decisions determine which aspects of performance the scores represent; however, raters are considered as one of the potential sources contributing to unwanted variability in scores (Davis, 2012). Although a great number of studies have been conducted to unpack how rater…

Descriptors: Rating Scales, Speech Communication, Second Language Learning, Second Language Instruction

Fuzzy Logic Applied for Pronunciation Assessment

Peer reviewed

Direct link

Bahi, Halima; Necibi, Khaled – International Journal of Computer-Assisted Language Learning and Teaching, 2020

Pronunciation teaching is an important stage in language learning activities. This article tackles the pronunciation scoring problem where research has demonstrated relatively low human-human and low human-machine agreement rates, which makes teachers skeptical about their relevance. To overcome these limitations, a fuzzy combination of two…

Descriptors: Oral Reading, Reading Fluency, Pronunciation, Learning Activities

A Comparative Judgment Approach to Assessing Chinese Sign Language Interpreting

Peer reviewed

Direct link

Han, Chao; Xiao, Xiaoyan – Language Testing, 2022

The quality of sign language interpreting (SLI) is a gripping construct among practitioners, educators and researchers, calling for reliable and valid assessment. There has been a diverse array of methods in the extant literature to measure SLI quality, ranging from traditional error analysis to recent rubric scoring. In this study, we want to…

Descriptors: Comparative Analysis, Sign Language, Deaf Interpreting, Evaluators

Administrators' Uses of Teacher Observation Protocol in Different Rating Contexts. Research Report. ETS RR-18-18

Peer reviewed
PDF on ERIC

Download full text

Qi, Yi; Bell, Courtney A.; Jones, Nathan D.; Lewis, Jennifer M.; Witherspoon, Margaret W.; Redash, Amanda – ETS Research Report Series, 2018

Teacher observations are being used for high-stakes purposes in states across the country, and administrators often serve as raters in teacher evaluation systems. This paper examines how the cognitive aspects of administrators' use of an observation instrument, a modified version of Charlotte Danielson's Framework for Teaching, interact with the…

Descriptors: Teacher Evaluation, Classroom Observation Techniques, Observation, Evaluation Methods

Comparison of Automatic and Expert Teachers' Rating of Computerized English Listening-Speaking Test

Peer reviewed
PDF on ERIC

Download full text

Linlin, Cao – English Language Teaching, 2020

Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…

Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

Previous Page | Next Page »

Pages: 1 | 2 | 3

ETS Research Report Series	4
Language Testing	3
Studies in Educational…	2
Assessment & Evaluation in…	1
Assessment and Evaluation in…	1
Bill & Melinda Gates…	1
Computer Assisted Language…	1
Contemporary Issues in…	1
Educational Assessment	1
Educational Measurement:…	1
Educational and Psychological…	1
English Language Teaching	1
Florida Journal of…	1
International Journal of…	1
International Journal of…	1
Journal of Educational and…	1
Journal of Experimental…	1
Journal of Motor Learning and…	1
Journal of Speech, Language,…	1
Language Education &…	1
Language Learning	1
Language Testing in Asia	1
Physical Educator	1
RAND Europe	1
More ▼

Bejar, Isaac I.	2
Han, Chao	2
Wolfe, Edward W.	2
Aida Carballo-Fazanes	1
Akpe, C. S.	1
Apache, R. R.	1
Apple, Kristen	1
Archer, Jeff	1
Attali, Yigal	1
Bahi, Halima	1
Baker, Eva L.	1
Bakker, Mirjam E. J.	1
Beijaard, Douwe	1
Bell, Courtney A.	1
Bolton, Dale L.	1
Bridgeford, Nancy J., Comp.	1
Bridgeman, Brent	1
Brown, Michelle Stallone	1
Cantrell, Steve	1
Castle-Clarke, Sophie	1
Chan, Sathena	1
Collier, Michael	1
Crews, William E., Jr.	1
Cristina Menescardi	1
More ▼