ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	11
Since 2007 (last 20 years)	28

Descriptor

Evaluators	33
Statistical Analysis	33
Interrater Reliability	26
Correlation	15
Second Language Learning	12
English (Second Language)	11
Foreign Countries	11
Comparative Analysis	8
Language Tests	8
Reliability	8
Scores	8
Essays	7
Evaluation Methods	7
Validity	7
Writing Evaluation	7
Oral Language	6
Computational Linguistics	5
Computer Assisted Testing	5
Holistic Approach	5
Language Proficiency	5
Rating Scales	5
Scoring	5
Scoring Rubrics	5
Accuracy	4
Native Language	4
More ▼

Publication Type

Journal Articles	30
Reports - Research	23
Reports - Evaluative	7
Tests/Questionnaires	6
Dissertations/Theses -…	2
Information Analyses	1
Reports - Descriptive	1

Education Level

Higher Education	9
Postsecondary Education	7
Secondary Education	3
Elementary Secondary Education	1
Grade 1	1
Grade 11	1
Grade 6	1
Grade 7	1

Audience

Location

Hong Kong	2
United Kingdom	2
Australia	1
Finland	1
Iran	1
Iran (Tehran)	1
Israel	1
Japan	1
Minnesota	1
Nigeria	1
Ohio	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	4
Flesch Kincaid Grade Level…	2

What Works Clearinghouse Rating

Showing 1 to 15 of 33 results Save | Export

A Systematic Review of Methods for Evaluating Rating Quality in Language Assessment

Peer reviewed

Direct link

Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018

The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…

Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability

Kappa and Rater Accuracy: Paradigms and Parameters

Peer reviewed

Direct link

Conger, Anthony J. – Educational and Psychological Measurement, 2017

Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…

Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis

Reviewing the Review: An Assessment of Dissertation Reviewer Feedback Quality

Peer reviewed
PDF on ERIC

Download full text

Lehan, Tara; Hussey, Heather; Mika, Eva – Journal of University Teaching and Learning Practice, 2016

Throughout the dissertation process, the chair and committee members provide feedback regarding quality to help the doctoral candidate to produce the highest-quality document and become an independent scholar. Nevertheless, results of previous research suggest that overall dissertation quality generally is poor. Because much of the feedback about…

Descriptors: Graduate Students, Doctoral Dissertations, Student Evaluation, Feedback (Response)

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

A Validation Study of Classroom Assessment Scoring System-Secondary in the Finnish School Context

Peer reviewed

Direct link

Virtanen, T. E.; Pakarinen, E.; Lerkkanen, M.-K.; Poikkeus, A.-M.; Siekkinen, M.; Nurmi, J.-E. – Journal of Early Adolescence, 2018

This study examined the reliability and validity of the Classroom Assessment Scoring System-Secondary (CLASS-S) in Finnish classrooms. Trained observers coded classroom interactions based on video recordings of 46 Grade 6 classrooms (450 cycles). Concurrent associations were investigated with respect to teacher self-ratings (e.g., efficacy beliefs…

Descriptors: Factor Analysis, Classroom Observation Techniques, Foreign Countries, Factor Structure

Using Subjective and Objective Measures to Predict Level of Reading Fluency at the End of First Grade

Peer reviewed

Direct link

Morris, Darrell; Pennell, Ashley M.; Perney, Jan; Trathen, Woodrow – Reading Psychology, 2018

This study compared reading rate to reading fluency (as measured by a rating scale). After listening to first graders read short passages, we assigned an overall fluency rating (low, average, or high) to each reading. We then used predictive discriminant analyses to determine which of five measures--accuracy, rate (objective); accuracy, phrasing,…

Descriptors: Reading Fluency, Prediction, Grade 1, Elementary School Students

Effects of Analytical and Holistic Scoring Patterns on Scorer Reliability in Biology Essay Tests

Peer reviewed
PDF on ERIC

Download full text

Ebuoh, Casmir N. – World Journal of Education, 2018

Literature revealed that the patterns/methods of scoring essay tests had been criticized for not being reliable and this unreliability is more likely to be more in internal examinations than in the external examinations. The purpose of this study is to find out the effects of analytical and holistic scoring patterns on scorer reliability in…

Descriptors: Holistic Approach, Scoring, Essay Tests, Biology

The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

Direct link

Yun, Jiyeo – ProQuest LLC, 2017

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…

Descriptors: Interrater Reliability, Essays, Scoring, Evaluators

Teachers' Cloud-Based Learning Designs: The Development of a Guiding Rubric Using the TPACK Framework

Peer reviewed

Direct link

Al-Harthi, Aisha Salim Ali; Campbell, Chris; Karimi, Arafeh – Computers in the Schools, 2018

This study aimed to develop, validate, and trial a rubric for evaluating the cloud-based learning designs (CBLD) that were developed by teachers using virtual learning environments. The rubric was developed using the technological pedagogical content knowledge (TPACK) framework, with rubric development including content and expert validation of…

Descriptors: Computer Assisted Instruction, Scoring Rubrics, Interrater Reliability, Content Validity

Evaluating Comparative Judgment as an Approach to Essay Scoring

Peer reviewed

Direct link

Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…

Descriptors: Essays, Scoring, Comparative Analysis, Evaluators

The Influence of Training and Experience on Rater Performance in Scoring Spoken Language

Peer reviewed

Direct link

Davis, Larry – Language Testing, 2016

Two factors were investigated that are thought to contribute to consistency in rater scoring judgments: rater training and experience in scoring. Also considered were the relative effects of scoring rubrics and exemplars on rater performance. Experienced teachers of English (N = 20) scored recorded responses from the TOEFL iBT speaking test prior…

Descriptors: Evaluators, Oral Language, Scores, Language Tests

Functional Adequacy in L2 Writing: Towards a New Rating Scale

Peer reviewed

Direct link

Kuiken, Folkert; Vedder, Ineke – Language Testing, 2017

The importance of functional adequacy as an essential component of L2 proficiency has been observed by several authors (Pallotti, 2009; De Jong, Steinel, Florijn, Schoonen, & Hulstijn, 2012a, b). The rationale underlying the present study is that the assessment of writing proficiency in L2 is not fully possible without taking into account the…

Descriptors: Second Language Learning, Rating Scales, Computational Linguistics, Persuasive Discourse

Multiple Mini-Interviews in the Age of the Internet: Does Preparation Help Applicants to Medical School?

Peer reviewed

Direct link

Moshinsky, Avital; Ziegler, David; Gafni, Naomi – International Journal of Testing, 2017

Many medical schools have adopted multiple mini-interviews (MMI) as an advanced selection tool. MMIs are expensive and used to test only a few dozen candidates per day, making it infeasible to develop a different test version for each test administration. Therefore, some items are reused both within and across years. This study investigated the…

Descriptors: Interviews, Medical Schools, Test Validity, Test Reliability

How Do Raters Judge Spoken Vocabulary?

Peer reviewed
PDF on ERIC

Download full text

Li, Hui – English Language Teaching, 2016

The aim of the study was to investigate how raters come to their decisions when judging spoken vocabulary. Segmental rating was introduced to quantify raters' decision-making process. It is hoped that this simulated study brings fresh insight to future methodological considerations with spoken data. Twenty trainee raters assessed five Chinese…

Descriptors: Foreign Countries, Evaluators, Interrater Reliability, Decision Making

Linguistic Features of Humor in Academic Writing

Peer reviewed
PDF on ERIC

Download full text

Skalicky, Stephen; Berger, Cynthia M.; Crossley, Scott A.; McNamara, Danielle S. – Advances in Language and Literary Studies, 2016

A corpus of 313 freshman college essays was analyzed in order to better understand the forms and functions of humor in academic writing. Human ratings of humor and wordplay were statistically aggregated using Factor Analysis to provide an overall "Humor" component score for each essay in the corpus. In addition, the essays were also…

Descriptors: Discourse Analysis, Academic Discourse, Humor, Writing (Composition)

Previous Page | Next Page »

Pages: 1 | 2 | 3

Language Testing	4
Applied Measurement in…	3
ETS Research Report Series	2
Educational and Psychological…	2
English Language Teaching	2
Language Assessment Quarterly	2
ProQuest LLC	2
Advances in Language and…	1
Computers in the Schools	1
Educational Psychology	1
Educational Research and…	1
International Journal of…	1
Issues in Educational Research	1
Journal of Early Adolescence	1
Journal of Experimental…	1
Journal of University…	1
Journal of Vocational…	1
New Horizons in Education	1
Online Submission	1
Reading Psychology	1
System: An International…	1
World Journal of Education	1
More ▼

Coniam, David	3
Ahmadi, Alireza	1
Al-Harthi, Aisha Salim Ali	1
Aryadoust, Vahid	1
Beh-Afarin, Seyed Reza	1
Berger, Cynthia M.	1
Campbell, Chris	1
Clevinger, Amanda	1
Cohen, Allan	1
Conger, Anthony J.	1
Cook, Daniel W.	1
Cooper, Paul G.	1
Crossley, Scott	1
Crossley, Scott A.	1
Davis, Larry	1
Ebuoh, Casmir N.	1
Ferrara, Steve	1
Gafni, Naomi	1
Gentile, Claudia	1
Greatorex, Jackie	1
Gregoire, Shirley Ann	1
Hambleton, Ronald K.	1
Hijikata-Someya, Yuko	1
Hussey, Heather	1
More ▼