ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	14
Since 2006 (last 20 years)	22

Descriptor

Evaluators	26
Interrater Reliability	26
Statistical Analysis	26
Correlation	11
Second Language Learning	10
English (Second Language)	9
Foreign Countries	8
Language Tests	7
Comparative Analysis	6
Oral Language	6
Scores	6
Essays	5
Evaluation Methods	5
Language Proficiency	5
Rating Scales	5
Scoring Rubrics	5
Writing Evaluation	5
Accuracy	4
Computational Linguistics	4
Computer Assisted Testing	4
Native Language	4
Prediction	4
Feedback (Response)	3
Graduate Students	3
Holistic Approach	3
More ▼

Publication Type

Journal Articles	25
Reports - Research	18
Reports - Evaluative	7
Tests/Questionnaires	5
Dissertations/Theses -…	1
Information Analyses	1

Education Level

Higher Education	7
Postsecondary Education	6
Secondary Education	2
Grade 1	1
Grade 11	1
Grade 6	1
Grade 7	1

Audience

Location

Hong Kong	2
United Kingdom	2
Finland	1
Iran	1
Iran (Tehran)	1
Japan	1
Ohio	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	3
Flesch Kincaid Grade Level…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 26 results Save | Export

A Systematic Review of Methods for Evaluating Rating Quality in Language Assessment

Peer reviewed

Direct link

Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018

The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…

Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability

Kappa and Rater Accuracy: Paradigms and Parameters

Peer reviewed

Direct link

Conger, Anthony J. – Educational and Psychological Measurement, 2017

Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…

Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

Direct link

Yun, Jiyeo – ProQuest LLC, 2017

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…

Descriptors: Interrater Reliability, Essays, Scoring, Evaluators

Using Subjective and Objective Measures to Predict Level of Reading Fluency at the End of First Grade

Peer reviewed

Direct link

Morris, Darrell; Pennell, Ashley M.; Perney, Jan; Trathen, Woodrow – Reading Psychology, 2018

This study compared reading rate to reading fluency (as measured by a rating scale). After listening to first graders read short passages, we assigned an overall fluency rating (low, average, or high) to each reading. We then used predictive discriminant analyses to determine which of five measures--accuracy, rate (objective); accuracy, phrasing,…

Descriptors: Reading Fluency, Prediction, Grade 1, Elementary School Students

Teachers' Cloud-Based Learning Designs: The Development of a Guiding Rubric Using the TPACK Framework

Peer reviewed

Direct link

Al-Harthi, Aisha Salim Ali; Campbell, Chris; Karimi, Arafeh – Computers in the Schools, 2018

This study aimed to develop, validate, and trial a rubric for evaluating the cloud-based learning designs (CBLD) that were developed by teachers using virtual learning environments. The rubric was developed using the technological pedagogical content knowledge (TPACK) framework, with rubric development including content and expert validation of…

Descriptors: Computer Assisted Instruction, Scoring Rubrics, Interrater Reliability, Content Validity

The Influence of Training and Experience on Rater Performance in Scoring Spoken Language

Peer reviewed

Direct link

Davis, Larry – Language Testing, 2016

Two factors were investigated that are thought to contribute to consistency in rater scoring judgments: rater training and experience in scoring. Also considered were the relative effects of scoring rubrics and exemplars on rater performance. Experienced teachers of English (N = 20) scored recorded responses from the TOEFL iBT speaking test prior…

Descriptors: Evaluators, Oral Language, Scores, Language Tests

Functional Adequacy in L2 Writing: Towards a New Rating Scale

Peer reviewed

Direct link

Kuiken, Folkert; Vedder, Ineke – Language Testing, 2017

The importance of functional adequacy as an essential component of L2 proficiency has been observed by several authors (Pallotti, 2009; De Jong, Steinel, Florijn, Schoonen, & Hulstijn, 2012a, b). The rationale underlying the present study is that the assessment of writing proficiency in L2 is not fully possible without taking into account the…

Descriptors: Second Language Learning, Rating Scales, Computational Linguistics, Persuasive Discourse

Reviewing the Review: An Assessment of Dissertation Reviewer Feedback Quality

Peer reviewed
PDF on ERIC

Download full text

Lehan, Tara; Hussey, Heather; Mika, Eva – Journal of University Teaching and Learning Practice, 2016

Throughout the dissertation process, the chair and committee members provide feedback regarding quality to help the doctoral candidate to produce the highest-quality document and become an independent scholar. Nevertheless, results of previous research suggest that overall dissertation quality generally is poor. Because much of the feedback about…

Descriptors: Graduate Students, Doctoral Dissertations, Student Evaluation, Feedback (Response)

How Do Raters Judge Spoken Vocabulary?

Peer reviewed
PDF on ERIC

Download full text

Li, Hui – English Language Teaching, 2016

The aim of the study was to investigate how raters come to their decisions when judging spoken vocabulary. Segmental rating was introduced to quantify raters' decision-making process. It is hoped that this simulated study brings fresh insight to future methodological considerations with spoken data. Twenty trainee raters assessed five Chinese…

Descriptors: Foreign Countries, Evaluators, Interrater Reliability, Decision Making

Linguistic Features of Humor in Academic Writing

Peer reviewed
PDF on ERIC

Download full text

Skalicky, Stephen; Berger, Cynthia M.; Crossley, Scott A.; McNamara, Danielle S. – Advances in Language and Literary Studies, 2016

A corpus of 313 freshman college essays was analyzed in order to better understand the forms and functions of humor in academic writing. Human ratings of humor and wordplay were statistically aggregated using Factor Analysis to provide an overall "Humor" component score for each essay in the corpus. In addition, the essays were also…

Descriptors: Discourse Analysis, Academic Discourse, Humor, Writing (Composition)

Assessing English Language Learners' Oral Performance: A Comparison of Monologue, Interview, and Group Oral Test

Peer reviewed

Direct link

Ahmadi, Alireza; Sadeghi, Elham – Language Assessment Quarterly, 2016

In the present study we investigated the effect of test format on oral performance in terms of test scores and discourse features (accuracy, fluency, and complexity). Moreover, we explored how the scores obtained on different test formats relate to such features. To this end, 23 Iranian EFL learners participated in three test formats of monologue,…

Descriptors: Oral Language, Comparative Analysis, Language Fluency, Accuracy

Grounding Lexical Diversity in Human Judgments

Peer reviewed

Direct link

Jarvis, Scott – Language Testing, 2017

The present study discusses the relevance of measures of lexical diversity (LD) to the assessment of learner corpora. It also argues that existing measures of LD, many of which have become specialized for use with language corpora, are fundamentally measures of lexical repetition, are based on an etic perspective of language, and lack construct…

Descriptors: Computational Linguistics, English (Second Language), Second Language Learning, Native Speakers

A Validation Study of Classroom Assessment Scoring System-Secondary in the Finnish School Context

Peer reviewed

Direct link

Virtanen, T. E.; Pakarinen, E.; Lerkkanen, M.-K.; Poikkeus, A.-M.; Siekkinen, M.; Nurmi, J.-E. – Journal of Early Adolescence, 2018

This study examined the reliability and validity of the Classroom Assessment Scoring System-Secondary (CLASS-S) in Finnish classrooms. Trained observers coded classroom interactions based on video recordings of 46 Grade 6 classrooms (450 cycles). Concurrent associations were investigated with respect to teacher self-ratings (e.g., efficacy beliefs…

Descriptors: Factor Analysis, Classroom Observation Techniques, Foreign Countries, Factor Structure

Evaluation by Native and Non-Native English Teacher-Raters of Japanese Students' Summaries

Peer reviewed
PDF on ERIC

Download full text

Hijikata-Someya, Yuko; Ono, Masumi; Yamanishi, Hiroyuki – English Language Teaching, 2015

Although the importance of summary writing is well documented in prior studies, few have investigated the evaluation of written summaries. Due to the complex nature of L2 summary writing, which requires one to read the original material and summarize its content in the L2, raters often emphasize different features when judging the quality of L2…

Descriptors: Foreign Countries, English (Second Language), Second Language Instruction, Second Language Learning

Previous Page | Next Page »

Pages: 1 | 2

Language Testing	4
Applied Measurement in…	2
Educational and Psychological…	2
English Language Teaching	2
Language Assessment Quarterly	2
Advances in Language and…	1
Computers in the Schools	1
ETS Research Report Series	1
Educational Research and…	1
Issues in Educational Research	1
Journal of Early Adolescence	1
Journal of Experimental…	1
Journal of University…	1
Journal of Vocational…	1
New Horizons in Education	1
Online Submission	1
ProQuest LLC	1
Reading Psychology	1
System: An International…	1
More ▼

Coniam, David	3
Ahmadi, Alireza	1
Al-Harthi, Aisha Salim Ali	1
Beh-Afarin, Seyed Reza	1
Berger, Cynthia M.	1
Campbell, Chris	1
Clevinger, Amanda	1
Cohen, Allan	1
Conger, Anthony J.	1
Crossley, Scott	1
Crossley, Scott A.	1
Davis, Larry	1
Greatorex, Jackie	1
Hambleton, Ronald K.	1
Hijikata-Someya, Yuko	1
Hussey, Heather	1
Jamieson, Joan	1
Jarvis, Scott	1
Johnson, Martin	1
Karimi, Arafeh	1
Kim, YouJin	1
Kuiken, Folkert	1
Lehan, Tara	1
Lerkkanen, M.-K.	1
More ▼