ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	11
Since 2006 (last 20 years)	28

Descriptor

Essay Tests	81
Interrater Reliability	81
Scoring	41
Writing Evaluation	40
Test Reliability	23
Higher Education	21
Evaluators	18
Holistic Evaluation	16
Scores	16
Correlation	15
Writing Tests	14
Comparative Analysis	13
College Entrance Examinations	12
College Students	11
Computer Assisted Testing	11
Writing (Composition)	11
Writing Skills	11
Evaluation Methods	10
English (Second Language)	9
Foreign Countries	9
Grading	9
Standardized Tests	9
Student Evaluation	9
Test Construction	9
Test Validity	9
More ▼

Publication Type

Reports - Research	57
Journal Articles	38
Speeches/Meeting Papers	23
Reports - Evaluative	15
Tests/Questionnaires	7
Dissertations/Theses -…	3
Numerical/Quantitative Data	3
Reports - Descriptive	3
Information Analyses	2
Book/Product Reviews	1
Books	1
Opinion Papers	1
Reference Materials -…	1
Reports - General	1
More ▼

Education Level

Higher Education	16
Postsecondary Education	14
Secondary Education	4
Elementary Secondary Education	3
High Schools	2
Adult Education	1
Grade 10	1
Grade 7	1

Audience

Researchers

Location

Arizona	1
Australia	1
Hong Kong	1
Iran	1
Japan	1
Kuwait	1
Michigan	1
North Carolina	1
Pennsylvania	1
South Korea	1
Turkey	1
United Kingdom (England)	1
United Kingdom (Scotland)	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	6
Graduate Record Examinations	4
Graduate Management Admission…	2
Medical College Admission Test	2
National Teacher Examinations	2
SAT (College Admission Test)	2
Test of Standard Written…	2
ACT Assessment	1
Advanced Placement…	1
Cognitive Abilities Test	1
General Educational…	1
International English…	1
Iowa Tests of Basic Skills	1
National Assessment of…	1
Praxis Series	1
Student Descriptive…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 81 results Save | Export

"How Do Raters Learn to Rate?" Many-Facet Rasch Modeling of Rater Performance over the Course of a Rater Certification Program

Peer reviewed

Direct link

Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023

This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification

Reliability of Essay Ratings: A Study on Generalizability Theory

Peer reviewed
PDF on ERIC

Download full text

Atilgan, Hakan – Eurasian Journal of Educational Research, 2019

Purpose: This study intended to examine the generalizability and reliability of essay ratings within the scope of the generalizability (G) theory. Specifically, the effect of raters on the generalizability and reliability of students' essay ratings was examined. Furthermore, variations of the generalizability and reliability coefficients with…

Descriptors: Foreign Countries, Essay Tests, Test Reliability, Interrater Reliability

Analyzing Rater Severity in a Freshman Composition Course Using Many Facet Rasch Measurement

Peer reviewed

Direct link

Erguvan, Inan Deniz; Aksu Dunya, Beyza – Language Testing in Asia, 2020

This study examined the rater severity of instructors using a multi-trait rubric in a freshman composition course offered in a private university in Kuwait. Use of standardized multi-trait rubrics is a recent development in this course and student feedback and anchor papers provided by instructors for each essay exam necessitated the assessment of…

Descriptors: Foreign Countries, College Freshmen, Freshman Composition, Writing Evaluation

Grading in Chemistry: Variations in Instructors' Evaluation of Student Written Responses

Direct link

Michelle Herridge – ProQuest LLC, 2021

Evaluation of student written work during summative assessments is an important and critical task for instructors at all educational levels. Nevertheless, few research studies exist that provide insights into how different instructors approach this task. Chemistry faculty (FIs) and graduate student instructors (GSIs) regularly engage in the…

Descriptors: Science Instruction, Chemistry, College Faculty, Teaching Assistants

Does the Time between Scoring Sessions Impact Scoring Accuracy? An Evaluation of Constructed-Response Essay Responses on the "GRE"® General Test. Research Report. ETS RR-18-31

Peer reviewed
PDF on ERIC

Download full text

Finn, Bridgid; Wendler, Cathy; Ricker-Pedley, Kathryn L.; Arslan, Burcu – ETS Research Report Series, 2018

This report investigates whether the time between scoring sessions has an influence on operational and nonoperational scoring accuracy. The study evaluates raters' scoring accuracy on constructed-response essay responses for the "GRE"® General Test. Binomial linear mixed-effect models are presented that evaluate how the effect of various…

Descriptors: Intervals, Scoring, Accuracy, Essay Tests

The Influence of Rater Effects in Training Sets on the Psychometric Quality of Automated Scoring for Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018

Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…

Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring

Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

Peer reviewed

Direct link

Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

Descriptors: Automation, Scoring, Comparative Analysis, Test Items

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

For a Greater Good: Bias Analysis in Writing Assessment

Peer reviewed

Direct link

Ahmadi Shirazi, Masoumeh – SAGE Open, 2019

Threats to construct validity should be reduced to a minimum. If true, sources of bias, namely raters, items, tests as well as gender, age, race, language background, culture, and socio-economic status need to be spotted and removed. This study investigates raters' experience, language background, and the choice of essay prompt as potential…

Descriptors: Foreign Countries, Language Tests, Test Bias, Essay Tests

Managing Rater Effects through the Use of FACETS Analysis: The Case of a University Placement Test

Peer reviewed

Direct link

Wu, Siew Mei; Tan, Susan – Higher Education Research and Development, 2016

Rating essays is a complex task where students' grades could be adversely affected by test-irrelevant factors such as rater characteristics and rating scales. Understanding these factors and controlling their effects are crucial for test validity. Rater behaviour has been extensively studied through qualitative methods such as questionnaires and…

Descriptors: Scoring, Item Response Theory, Student Placement, College Students

Evaluation of "e-rater"® for the "Praxis I"®Writing Test. Research Report. ETS RR-15-03

Peer reviewed
PDF on ERIC

Download full text

Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M. – ETS Research Report Series, 2015

Automated scoring models were trained and evaluated for the essay task in the "Praxis I"® writing test. Prompt-specific and generic "e-rater"® scoring models were built, and evaluation statistics, such as quadratic weighted kappa, Pearson correlation, and standardized differences in mean scores, were examined to evaluate the…

Descriptors: Writing Tests, Licensing Examinations (Professions), Teacher Competency Testing, Scoring

Development and Validation of the Written Communication Assessment of the "HEIghten"® Outcomes Assessment Suite. Research Report. ETS RR-17-53

Peer reviewed
PDF on ERIC

Download full text

Rios, Joseph A.; Sparks, Jesse R.; Zhang, Mo; Liu, Ou Lydia – ETS Research Report Series, 2017

Proficiency with written communication (WC) is critical for success in college and careers. As a result, institutions face a growing challenge to accurately evaluate their students' writing skills to obtain data that can support demands of accreditation, accountability, or curricular improvement. Many current standardized measures, however, lack…

Descriptors: Test Construction, Test Validity, Writing Tests, College Outcomes Assessment

The Impact of Sampling Approach on Population Invariance in Automated Scoring of Essays. Research Report. ETS RR-13-18

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo – ETS Research Report Series, 2013

Many testing programs use automated scoring to grade essays. One issue in automated essay scoring that has not been examined adequately is population invariance and its causes. The primary purpose of this study was to investigate the impact of sampling in model calibration on population invariance of automated scores. This study analyzed scores…

Descriptors: Automation, Scoring, Essay Tests, Sampling

Scoring with the Computer: Alternative Procedures for Improving the Reliability of Holistic Essay Scoring

Peer reviewed

Direct link

Attali, Yigal; Lewis, Will; Steier, Michael – Language Testing, 2013

Automated essay scoring can produce reliable scores that are highly correlated with human scores, but is limited in its evaluation of content and other higher-order aspects of writing. The increased use of automated essay scoring in high-stakes testing underscores the need for human scoring that is focused on higher-order aspects of writing. This…

Descriptors: Scoring, Essay Tests, Reliability, High Stakes Tests

Self-Assessment as Academic Community Building: A Study from A Japanese Liberal Arts University

Peer reviewed

Direct link

Hale, Chris C. – Language Testing in Asia, 2015

Student self-assessment has been heralded as a way of increasing student ownership of the learning process, enhancing metacognative awareness of their learning progress as well as promoting learner autonomy. In a university setting, where a major aim is to promote critical thinking and attentiveness to one's responsibility in an academic…

Descriptors: Self Evaluation (Individuals), Learning Processes, Metacognition, Personal Autonomy

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

Applied Measurement in…	6
ETS Research Report Series	5
Journal of Educational…	3
ProQuest LLC	3
Eurasian Journal of…	2
Language Testing	2
Language Testing in Asia	2
ALT-J: Research in Learning…	1
Assessing Writing	1
Assessment in Education:…	1
British Journal of…	1
Educational Assessment	1
Educational Measurement:…	1
European Journal of…	1
Evaluation and the Health…	1
Higher Education Research and…	1
International Journal of…	1
Journal of Educational…	1
Journal of Educational and…	1
Journal of Outcome Measurement	1
Journal of Reading	1
Journal of Technology,…	1
Language Assessment Quarterly	1
ReCALL	1
SAGE Open	1
More ▼

Breland, Hunter M.	3
Anderson, Judith A.	2
Busch, John Christian	2
Ferrara, Steven F.	2
Lunz, Mary E.	2
Mitchell, Karen J.	2
Stahl, John A.	2
Wolfe, Edward W.	2
Zhang, Mo	2
Ackerman, Terry A.	1
Aghbar, Ali-Asghar	1
Ahmadi Shirazi, Masoumeh	1
Aksu Dunya, Beyza	1
Almond, Patricia	1
Angoff, William H.	1
Arslan, Burcu	1
Atilgan, Hakan	1
Attali, Yigal	1
Auchter, Joan Chikos	1
Baird, Jo-Anne	1
Baker, Eva L.	1
Barkaoui, Khaled	1
Barter, Alice K.	1
Bell, John F.	1
More ▼