Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 11 |
Since 2006 (last 20 years) | 28 |
Descriptor
Essay Tests | 81 |
Interrater Reliability | 81 |
Scoring | 41 |
Writing Evaluation | 40 |
Test Reliability | 23 |
Higher Education | 21 |
Evaluators | 18 |
Holistic Evaluation | 16 |
Scores | 16 |
Correlation | 15 |
Writing Tests | 14 |
More ▼ |
Source
Author
Breland, Hunter M. | 3 |
Anderson, Judith A. | 2 |
Busch, John Christian | 2 |
Ferrara, Steven F. | 2 |
Lunz, Mary E. | 2 |
Mitchell, Karen J. | 2 |
Stahl, John A. | 2 |
Wolfe, Edward W. | 2 |
Zhang, Mo | 2 |
Ackerman, Terry A. | 1 |
Aghbar, Ali-Asghar | 1 |
More ▼ |
Publication Type
Education Level
Higher Education | 16 |
Postsecondary Education | 14 |
Secondary Education | 4 |
Elementary Secondary Education | 3 |
High Schools | 2 |
Adult Education | 1 |
Grade 10 | 1 |
Grade 7 | 1 |
Audience
Researchers | 9 |
Location
Arizona | 1 |
Australia | 1 |
Hong Kong | 1 |
Iran | 1 |
Japan | 1 |
Kuwait | 1 |
Michigan | 1 |
North Carolina | 1 |
Pennsylvania | 1 |
South Korea | 1 |
Turkey | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023
This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…
Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification
Atilgan, Hakan – Eurasian Journal of Educational Research, 2019
Purpose: This study intended to examine the generalizability and reliability of essay ratings within the scope of the generalizability (G) theory. Specifically, the effect of raters on the generalizability and reliability of students' essay ratings was examined. Furthermore, variations of the generalizability and reliability coefficients with…
Descriptors: Foreign Countries, Essay Tests, Test Reliability, Interrater Reliability
Erguvan, Inan Deniz; Aksu Dunya, Beyza – Language Testing in Asia, 2020
This study examined the rater severity of instructors using a multi-trait rubric in a freshman composition course offered in a private university in Kuwait. Use of standardized multi-trait rubrics is a recent development in this course and student feedback and anchor papers provided by instructors for each essay exam necessitated the assessment of…
Descriptors: Foreign Countries, College Freshmen, Freshman Composition, Writing Evaluation
Michelle Herridge – ProQuest LLC, 2021
Evaluation of student written work during summative assessments is an important and critical task for instructors at all educational levels. Nevertheless, few research studies exist that provide insights into how different instructors approach this task. Chemistry faculty (FIs) and graduate student instructors (GSIs) regularly engage in the…
Descriptors: Science Instruction, Chemistry, College Faculty, Teaching Assistants
Finn, Bridgid; Wendler, Cathy; Ricker-Pedley, Kathryn L.; Arslan, Burcu – ETS Research Report Series, 2018
This report investigates whether the time between scoring sessions has an influence on operational and nonoperational scoring accuracy. The study evaluates raters' scoring accuracy on constructed-response essay responses for the "GRE"® General Test. Binomial linear mixed-effect models are presented that evaluate how the effect of various…
Descriptors: Intervals, Scoring, Accuracy, Essay Tests
Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018
Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…
Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring
Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017
Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…
Descriptors: Automation, Scoring, Comparative Analysis, Test Items
Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018
The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…
Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators
Ahmadi Shirazi, Masoumeh – SAGE Open, 2019
Threats to construct validity should be reduced to a minimum. If true, sources of bias, namely raters, items, tests as well as gender, age, race, language background, culture, and socio-economic status need to be spotted and removed. This study investigates raters' experience, language background, and the choice of essay prompt as potential…
Descriptors: Foreign Countries, Language Tests, Test Bias, Essay Tests
Wu, Siew Mei; Tan, Susan – Higher Education Research and Development, 2016
Rating essays is a complex task where students' grades could be adversely affected by test-irrelevant factors such as rater characteristics and rating scales. Understanding these factors and controlling their effects are crucial for test validity. Rater behaviour has been extensively studied through qualitative methods such as questionnaires and…
Descriptors: Scoring, Item Response Theory, Student Placement, College Students
Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M. – ETS Research Report Series, 2015
Automated scoring models were trained and evaluated for the essay task in the "Praxis I"® writing test. Prompt-specific and generic "e-rater"® scoring models were built, and evaluation statistics, such as quadratic weighted kappa, Pearson correlation, and standardized differences in mean scores, were examined to evaluate the…
Descriptors: Writing Tests, Licensing Examinations (Professions), Teacher Competency Testing, Scoring
Rios, Joseph A.; Sparks, Jesse R.; Zhang, Mo; Liu, Ou Lydia – ETS Research Report Series, 2017
Proficiency with written communication (WC) is critical for success in college and careers. As a result, institutions face a growing challenge to accurately evaluate their students' writing skills to obtain data that can support demands of accreditation, accountability, or curricular improvement. Many current standardized measures, however, lack…
Descriptors: Test Construction, Test Validity, Writing Tests, College Outcomes Assessment
Zhang, Mo – ETS Research Report Series, 2013
Many testing programs use automated scoring to grade essays. One issue in automated essay scoring that has not been examined adequately is population invariance and its causes. The primary purpose of this study was to investigate the impact of sampling in model calibration on population invariance of automated scores. This study analyzed scores…
Descriptors: Automation, Scoring, Essay Tests, Sampling
Attali, Yigal; Lewis, Will; Steier, Michael – Language Testing, 2013
Automated essay scoring can produce reliable scores that are highly correlated with human scores, but is limited in its evaluation of content and other higher-order aspects of writing. The increased use of automated essay scoring in high-stakes testing underscores the need for human scoring that is focused on higher-order aspects of writing. This…
Descriptors: Scoring, Essay Tests, Reliability, High Stakes Tests
Hale, Chris C. – Language Testing in Asia, 2015
Student self-assessment has been heralded as a way of increasing student ownership of the learning process, enhancing metacognative awareness of their learning progress as well as promoting learner autonomy. In a university setting, where a major aim is to promote critical thinking and attentiveness to one's responsibility in an academic…
Descriptors: Self Evaluation (Individuals), Learning Processes, Metacognition, Personal Autonomy