Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 3 |
| Since 2017 (last 10 years) | 10 |
| Since 2007 (last 20 years) | 29 |
Descriptor
| Reliability | 48 |
| Writing Tests | 48 |
| Scores | 20 |
| English (Second Language) | 16 |
| Scoring | 16 |
| Foreign Countries | 14 |
| Second Language Learning | 14 |
| Writing Evaluation | 14 |
| Validity | 13 |
| Comparative Analysis | 12 |
| Essays | 11 |
| More ▼ | |
Source
Author
| Kantor, Robert | 5 |
| Lee, Yong-Won | 5 |
| Attali, Yigal | 3 |
| Burstein, Jill | 2 |
| Deane, Paul | 2 |
| Mollaun, Pam | 2 |
| Abdel-Haq, Eman Muhammad | 1 |
| Ahmed, Tamim | 1 |
| Ahmed, Yusra | 1 |
| Al-Sayed, Rania Kamal Muhammad | 1 |
| Ali Al-Barakat | 1 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 11 |
| Secondary Education | 10 |
| Postsecondary Education | 9 |
| Elementary Education | 7 |
| High Schools | 5 |
| Elementary Secondary Education | 3 |
| Grade 10 | 3 |
| Grade 11 | 3 |
| Middle Schools | 3 |
| Grade 4 | 2 |
| Grade 7 | 2 |
| More ▼ | |
Audience
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Ali Al-Barakat; Rommel AlAli; Omayya Al-Hassan; Khaled Al-Saud – Educational Process: International Journal, 2025
Background/purpose: The study tries to discover how predictive thinking can be incorporated into writing activities to assist students in developing their creative skills in writing learning environments. Through this study, teachers will be able to adopt a new teaching method that helps transform the way creative writing is taught in language…
Descriptors: Thinking Skills, Creative Writing, Writing Instruction, Validity
Osama Koraishi – Language Teaching Research Quarterly, 2024
This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…
Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence
Gioia, Anthony R.; Ahmed, Yusra; Woods, Steven P.; Cirino, Paul T. – Reading and Writing: An Interdisciplinary Journal, 2023
There is significant overlap between reading and writing, but no known standardized measure assesses these jointly. The goal of the present study is to evaluate the properties of a novel measure, the Assessment of Writing, Self-Monitoring, and Reading (AWSM Reader), that simultaneously evaluates both reading comprehension and writing. In doing so,…
Descriptors: Reading Writing Relationship, Writing Evaluation, Self Evaluation (Individuals), Executive Function
Song, Yi; Deane, Paul; Beigman Klebanov, Beata – ETS Research Report Series, 2017
This project focuses on laying the foundations for automated analysis of argumentation schemes, supporting identification and classification of the arguments being made in a text, for the purpose of scoring the quality of written analyses of arguments. We developed annotation protocols for 20 argument prompts from a college-level test under the…
Descriptors: Scoring, Automation, Persuasive Discourse, Documentation
Sari, Elif; Han, Turgay – Reading Matrix: An International Online Journal, 2021
Providing both effective feedback applications and reliable assessment practices are two central issues in ESL/EFL writing instruction contexts. Giving individual feedback is very difficult in crowded classes as it requires a great amount of time and effort for instructors. Moreover, instructors likely employ inconsistent assessment procedures,…
Descriptors: Automation, Writing Evaluation, Artificial Intelligence, Natural Language Processing
Choi, Ikkyu; Hao, Jiangang; Deane, Paul; Zhang, Mo – ETS Research Report Series, 2021
"Biometrics" are physical or behavioral human characteristics that can be used to identify a person. It is widely known that keystroke or typing dynamics for short, fixed texts (e.g., passwords) could serve as a behavioral biometric. In this study, we investigate whether keystroke data from essay responses can lead to a reliable…
Descriptors: Accuracy, High Stakes Tests, Writing Tests, Benchmarking
Qu, Yanxuan; Huo, Yan; Chan, Eric; Shotts, Matthew – ETS Research Report Series, 2017
For educational tests, it is critical to maintain consistency of score scales and to understand the sources of variation in score means over time. This practice helps to ensure that interpretations about test takers' abilities are comparable from one administration (or one form) to another. This study examines the consistency of reported scores…
Descriptors: Scores, English (Second Language), Language Tests, Second Language Learning
Consistency and Stability of Italian Children's Spelling in Dictation versus Composition Assessments
Bigozzi, Lucia; Tarchi, Christian; Pinto, Giuliana – Reading & Writing Quarterly, 2017
The purpose of this study was to investigate consistency in spelling skills across 2 different tasks of written production (dictation vs. composition) and stability of performance across 4 different grades. We assessed 2nd, 3rd, 4th, and 5th graders' spelling performance through 4 tasks: 2 dictation tasks (passage and sentences) and 2 composition…
Descriptors: Foreign Countries, Spelling, Reliability, Verbal Communication
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators
Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017
Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…
Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators
Kural, Faruk – Journal of Language and Linguistic Studies, 2018
The present paper, which is a study based on midterm exam results of 53 University English prep-school students, examines correlation between a direct writing test, measured holistically by multiple-trait scoring, and two indirect writing tests used in a competence exam, one of which is a multiple-choice cloze test and the other a rewrite test…
Descriptors: Writing Evaluation, Cloze Procedure, Comparative Analysis, Essays
Humphry, Stephen M.; McGrane, Joshua A. – Australian Educational Researcher, 2015
This paper presents a method for equating writing assessments using pairwise comparisons which does not depend upon conventional common-person or common-item equating designs. Pairwise comparisons have been successfully applied in the assessment of open-ended tasks in English and other areas such as visual art and philosophy. In this paper,…
Descriptors: Writing Evaluation, Evaluation Methods, Comparative Analysis, Writing Tests
Ricker-Pedley, Kathryn L. – Educational Testing Service, 2011
A pseudo-experimental study was conducted to examine the link between rater accuracy calibration performances and subsequent accuracy during operational scoring. The study asked 45 raters to score a 75-response calibration set and then a 100-response (operational) set of responses from a retired Graduate Record Examinations[R] (GRE[R]) writing…
Descriptors: Scoring, Accuracy, College Entrance Examinations, Writing Tests
He, Qingping; Anwyll, Steve; Glanville, Matthew; Deavall, Angela – Educational Research, 2013
Background: Although there has been considerable research into the reliability of marking for the Key Stage 3 (KS3) National Curriculum tests (NCTs) and public examinations such as the General Certificate of Secondary Education examinations (GCSEs) in England, little is understood about the level of reliability of marking of the Key Stage 2 (KS2)…
Descriptors: National Curriculum, Foreign Countries, Writing Skills, Writing Tests
Ahmed, Tamim; Hanif, Maria – Journal of Education and Practice, 2016
This study is intended to investigate student's achievement capability among two families i.e. Low and High income families and designed for primary level learners. A Reading, Arithmetic and Writing (RAW) Achievement test that was developed as a part of another research study (Tamim Ahmed Khan, 2015) was adopted for this study. Both English medium…
Descriptors: Low Income, Performance Based Assessment, Elementary School Students, Achievement Tests

Peer reviewed
Direct link
