Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 7 |
Since 2006 (last 20 years) | 12 |
Descriptor
Interrater Reliability | 38 |
Test Reliability | 38 |
Writing Evaluation | 38 |
Scoring | 19 |
Essay Tests | 17 |
Test Validity | 12 |
Writing Skills | 12 |
Higher Education | 11 |
Evaluation Criteria | 9 |
Foreign Countries | 9 |
Holistic Evaluation | 9 |
More ▼ |
Source
Author
Breland, Hunter M. | 2 |
Carlson, Sybil B. | 2 |
Ackerman, Terry A. | 1 |
Aghbar, Ali-Asghar | 1 |
Aktas, Mehtap | 1 |
Alici, Devrim | 1 |
Anderson, Judith A. | 1 |
Anderson, Stephen A. | 1 |
Atilgan, Hakan | 1 |
Ballard, Laura | 1 |
Barter, Alice K. | 1 |
More ▼ |
Publication Type
Education Level
Higher Education | 4 |
Postsecondary Education | 4 |
Elementary Education | 1 |
Grade 1 | 1 |
Grade 2 | 1 |
Kindergarten | 1 |
Audience
Researchers | 8 |
Practitioners | 2 |
Teachers | 2 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Bolton, Tiffany; Stevenson, Brittney; Janes, William – Journal of Occupational Therapy, Schools & Early Intervention, 2023
Researchers utilized a cross-sectional secondary analysis of data within an ongoing non-randomized controlled trial study design to establish the reliability and internal consistency of a novel handwriting assessment for preschoolers, the Just Write! (JW), written by the authors. Seventy-eight children from an area preschool participated in the…
Descriptors: Handwriting, Writing Skills, Writing Evaluation, Preschool Children
Lynsey Joohyun Lee – ProQuest LLC, 2021
Reliability and validity are two important topics that have been studied for many decades in the educational measurement field, including discussions of Writing Studies' subfield of writing assessment, since the establishment of the College Entrance Exam Board [CEEB] in 1899 (Huot et al., 2010). In recent years, scholarly conversations of fairness…
Descriptors: Writing Evaluation, Test Validity, Test Reliability, Case Studies
Atilgan, Hakan – Eurasian Journal of Educational Research, 2019
Purpose: This study intended to examine the generalizability and reliability of essay ratings within the scope of the generalizability (G) theory. Specifically, the effect of raters on the generalizability and reliability of students' essay ratings was examined. Furthermore, variations of the generalizability and reliability coefficients with…
Descriptors: Foreign Countries, Essay Tests, Test Reliability, Interrater Reliability
Wendler, Cathy; Glazer, Nancy; Cline, Frederick – ETS Research Report Series, 2019
One of the challenges in scoring constructed-response (CR) items and tasks is ensuring that rater drift does not occur during or across scoring windows. Rater drift reflects changes in how raters interpret and use established scoring criteria to assign essay scores. Calibration is a process used to help control rater drift and, as such, serves as…
Descriptors: College Entrance Examinations, Graduate Study, Accuracy, Test Reliability
Michelle Herridge – ProQuest LLC, 2021
Evaluation of student written work during summative assessments is an important and critical task for instructors at all educational levels. Nevertheless, few research studies exist that provide insights into how different instructors approach this task. Chemistry faculty (FIs) and graduate student instructors (GSIs) regularly engage in the…
Descriptors: Science Instruction, Chemistry, College Faculty, Teaching Assistants
Uzun, N. Bilge; Alici, Devrim; Aktas, Mehtap – European Journal of Educational Research, 2019
The purpose of study is to examine the reliability of analytical rubrics and checklists developed for the assessment of story writing skills by means of generalizability theory. The study group consisted of 52 students attending the 5th grade at primary school and 20 raters in Mersin University. The G study was carried out with the fully crossed…
Descriptors: Foreign Countries, Scoring Rubrics, Check Lists, Writing Tests
Ballard, Laura – ProQuest LLC, 2017
Rater scoring has an impact on writing test reliability and validity. Thus, there has been a continued call for researchers to investigate issues related to rating (Crusan, 2015). Investigating the scoring process and understanding how raters arrive at particular scores are critical "because the score is ultimately what will be used in making…
Descriptors: Evaluators, Schemata (Cognition), Eye Movements, Scoring Rubrics
Razi, Salim – SAGE Open, 2015
Similarity reports of plagiarism detectors should be approached with caution as they may not be sufficient to support allegations of plagiarism. This study developed a 50-item rubric to simplify and standardize evaluation of academic papers. In the spring semester of 2011-2012 academic year, 161 freshmen's papers at the English Language Teaching…
Descriptors: Foreign Countries, Scoring Rubrics, Writing Evaluation, Writing (Composition)
Heldsinger, Sandra A.; Humphry, Stephen M. – Educational Research, 2013
Background: Many in education argue for the importance of incorporating teacher judgements in the assessment and reporting of student performance. Advocates of such an approach are cognisant, though, that obtaining a satisfactory level of consistency in teacher judgements poses a challenge. Purpose: This study investigates the extent to which the…
Descriptors: Evaluation Methods, Student Evaluation, Teacher Attitudes, Comparative Analysis
Erling, Elizabeth J.; Richardson, John T. E. – Assessing Writing, 2010
Measuring the Academic Skills of University Students is a procedure developed in the 1990s at the University of Sydney's Language Centre to identify students in need of academic writing development by assessing examples of their written work against five criteria. This paper reviews the literature relating to the development of the procedure with…
Descriptors: Foreign Countries, Writing Evaluation, Assignments, Psychometrics
Gebril, Atta – Assessing Writing, 2010
Integrated tasks are currently employed in a number of L2 exams since they are perceived as an addition to the writing-only task type. Given this trend, the current study investigates composite score generalizability of both reading-to-write and writing-only tasks. For this purpose, a multivariate generalizability analysis is used to investigate…
Descriptors: Scoring, Scores, Second Language Instruction, Writing Evaluation
Lim, Gad S. – ProQuest LLC, 2009
Performance assessments have become the norm for evaluating language learners' writing abilities in international examinations of English proficiency. Two aspects of these assessments are usually systematically varied: test takers respond to different prompts, and their responses are read by different raters. This raises the possibility of undue…
Descriptors: Performance Based Assessment, Language Tests, Performance Tests, Test Validity

Spaulding, Cheryl L. – Journal of Reading, 1989
Reviews "Written Language Assessment" (WLA), a new standardized test to evaluate children's and adolescents' written language competence by having students write essays instead of answer multiple choice questions. Finds problems with the WLA in terms of interrater reliability. (RS)
Descriptors: Elementary Secondary Education, Essay Tests, Interrater Reliability, Standardized Tests

Anderson, Stephen A. – Michigan Reading Journal, 2002
Considers the development of an inter-rater reliability correlation comparing the judgments, or scores, or each judge to see if their observations are similar. Presents a case study of the Northville Public Schools' data for the 2000 MEAP (Michigan Educational Assessment Program) Writing Test. Concludes that in this case study the state fails both…
Descriptors: Case Studies, Elementary Education, Evaluation Research, Interrater Reliability
Paden, Patricia A. – 1986
Two factors which may affect the ratings assigned to an essay test are investigated: (1) context effects; and (2) score level effects. Context effects exist in essay scoring if an essay is rated higher when preceded by poor quality essays than when preceded by high quality essays. A score level effect is defined as a change in the score (value)…
Descriptors: Context Effect, Essay Tests, Holistic Evaluation, Interrater Reliability