ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	11
Since 2016 (last 10 years)	22
Since 2006 (last 20 years)	27

Descriptor

Evaluators	36
Interrater Reliability	36
Writing Evaluation	36
Scoring	14
English (Second Language)	13
Essays	13
Foreign Countries	13
Second Language Learning	13
Correlation	12
Comparative Analysis	9
Evaluation Criteria	9
Essay Tests	8
Scores	8
Scoring Rubrics	8
Rating Scales	7
Undergraduate Students	6
Writing (Composition)	6
Writing Tests	6
Accuracy	5
Computational Linguistics	5
Computer Assisted Testing	5
Second Language Instruction	5
Statistical Analysis	5
Computer Software	4
Decision Making	4
More ▼

Publication Type

Journal Articles	26
Reports - Research	26
Tests/Questionnaires	8
Reports - Evaluative	5
Speeches/Meeting Papers	5
Dissertations/Theses -…	3
Opinion Papers	2
Information Analyses	1
Reports - Descriptive	1

Education Level

Higher Education	8
Postsecondary Education	7
Secondary Education	5
Grade 7	2
Elementary Education	1
Elementary Secondary Education	1
Grade 11	1
Grade 6	1
High Schools	1
Junior High Schools	1
Middle Schools	1
More ▼

Audience

Researchers	2
Practitioners	1
Teachers	1

Location

China	2
Turkey	2
Europe	1
Germany	1
Hong Kong	1
Iran	1
Japan	1
Kuwait	1
Norway	1
Ohio	1
Switzerland	1
United Kingdom	1
West Germany	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	2
General Educational…	1
International English…	1
National Assessment of…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 36 results Save | Export

Evaluating Quadratic Weighted Kappa as the Standard Performance Metric for Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Doewes, Afrizal; Kurdhi, Nughthoh Arfawi; Saxena, Akrati – International Educational Data Mining Society, 2023

Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of…

Descriptors: Essays, Writing Evaluation, Evaluators, Accuracy

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

The Whole Is More than the Sum of Its Parts -- Assessing Writing Using the Consensual Assessment Technique

Peer reviewed

Direct link

Zahn, Daniela; Canton, Ursula; Boyd, Victoria; Hamilton, Laura; Mamo, Josianne; McKay, Jane; Proudfoot, Linda; Telfer, Dickson; Williams, Kim; Wilson, Colin – Studies in Higher Education, 2021

Evaluating the impact of Academic Literacies teaching (Lea and Street [1998. "Student Writing in Higher Education: An Academic Literacies Approach." "Studies in Higher Education" 23 (2): 157-72. doi:10.1080/03075079812331380364]) is difficult, as it involves gauging whether writers: (1) gain better understanding of what…

Descriptors: Writing Evaluation, Evaluation Methods, Undergraduate Students, Foreign Countries

"How Do Raters Learn to Rate?" Many-Facet Rasch Modeling of Rater Performance over the Course of a Rater Certification Program

Peer reviewed

Direct link

Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023

This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Depth-Perception-Based Representation in Holistic Rating on ESL Essay Writing

Peer reviewed

Direct link

Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024

This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…

Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Scoring Rubric Reliability and Internal Validity in Rater-Mediated EFL Writing Assessment: Insights from Many-Facet Rasch Measurement

Peer reviewed

Direct link

Li, Wentao – Reading and Writing: An Interdisciplinary Journal, 2022

Scoring rubrics are known to be effective for assessing writing for both testing and classroom teaching purposes. How raters interpret the descriptors in a rubric can significantly impact the subsequent final score, and further, the descriptors may also color a rater's judgment of a student's writing quality. Little is known, however, about how…

Descriptors: Scoring Rubrics, Interrater Reliability, Writing Evaluation, Teaching Methods

Analyzing Rater Severity in a Freshman Composition Course Using Many Facet Rasch Measurement

Peer reviewed

Direct link

Erguvan, Inan Deniz; Aksu Dunya, Beyza – Language Testing in Asia, 2020

This study examined the rater severity of instructors using a multi-trait rubric in a freshman composition course offered in a private university in Kuwait. Use of standardized multi-trait rubrics is a recent development in this course and student feedback and anchor papers provided by instructors for each essay exam necessitated the assessment of…

Descriptors: Foreign Countries, College Freshmen, Freshman Composition, Writing Evaluation

Examining the Differential Rater Functioning in the Process of Assessing Writing Skills of Middle School 7th Grade Students

Peer reviewed
PDF on ERIC

Download full text

Erman Aslanoglu, Aslihan; Sata, Mehmet – Participatory Educational Research, 2021

When students present writing tasks that require higher order thinking skills to work, one of the most important problems is scoring these writing tasks objectively. The fact that raters give scores below or above their performance based on several environmental factors affects the consistency of the measurements. Inconsistencies in scoring…

Descriptors: Interrater Reliability, Evaluators, Error of Measurement, Writing Evaluation

The Longitudinal Stability of Rating Characteristics in an EFL Examination: Methodological and Substantive Considerations

Peer reviewed

Direct link

Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021

This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…

Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation

The Influence of Rater Effects in Training Sets on the Psychometric Quality of Automated Scoring for Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018

Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…

Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring

The Use of Semantic Similarity Tools in Automated Content Scoring of Fact-Based Essays Written by EFL Learners

Peer reviewed

Direct link

Wang, Qiao – Education and Information Technologies, 2022

This study searched for open-source semantic similarity tools and evaluated their effectiveness in automated content scoring of fact-based essays written by English-as-a-Foreign-Language (EFL) learners. Fifty writing samples under a fact-based writing task from an academic English course in a Japanese university were collected and a gold standard…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Scoring

Writing Scale Effects on Raters: An Exploratory Study

Peer reviewed

Direct link

Jeong, Heejeong – Language Testing in Asia, 2019

In writing assessment, finding a valid, reliable, and efficient scale is critical. Appropriate scales, increase rater reliability, and can also save time and money. This exploratory study compared the effects of a binary scale and an analytic scale across teacher raters and expert raters. The purpose of the study is to find out how different scale…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

Automated Essay Scoring at Scale: A Case Study in Switzerland and Germany. TOEFL® Research Report. RR-86. ETS RR-19-12

Peer reviewed
PDF on ERIC

Download full text

Rupp, André A.; Casabianca, Jodi M.; Krüger, Maleika; Keller, Stefan; Köller, Olaf – ETS Research Report Series, 2019

In this research report, we describe the design and empirical findings for a large-scale study of essay writing ability with approximately 2,500 high school students in Germany and Switzerland on the basis of 2 tasks with 2 associated prompts, each from a standardized writing assessment whose scoring involved both human and automated components.…

Descriptors: Automation, Foreign Countries, English (Second Language), Language Tests

Previous Page | Next Page »

Pages: 1 | 2 | 3

Language Testing	6
ProQuest LLC	3
Language Testing in Asia	2
Advances in Language and…	1
Applied Measurement in…	1
Assessing Writing	1
Assessment in Education:…	1
ETS Research Report Series	1
Education and Information…	1
Educational Measurement:…	1
Educational Research and…	1
Educational Sciences: Theory…	1
English Teaching	1
International Educational…	1
International Journal of…	1
Journal of Baltic Science…	1
Participatory Educational…	1
Reading & Writing Quarterly	1
Reading and Writing: An…	1
SAGE Open	1
Studies in Educational…	1
Studies in Higher Education	1
More ▼

Wind, Stefanie A.	2
Ahmadi Shirazi, Masoumeh	1
Aksu Dunya, Beyza	1
Ari, Gokhan	1
Attali, Yigal	1
Auchter, Joan Chikos	1
Ballard, Laura	1
Barkhuizen, Gary	1
Berger, Cynthia M.	1
Beyreli, Latif	1
Boyd, Victoria	1
Burry, James	1
Canton, Ursula	1
Casabianca, Jodi M.	1
Chuang, Ping-Lin	1
Coniam, David	1
Cross, James Logan	1
Crossley, Scott A.	1
Daiker, Donald A.	1
Doewes, Afrizal	1
Elder, Catherine	1
Engelhard, George, Jr.	1
Erguvan, Inan Deniz	1
Erman Aslanoglu, Aslihan	1
More ▼