Publication Date
In 2025 | 2 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 11 |
Since 2006 (last 20 years) | 16 |
Descriptor
Scores | 16 |
Writing Evaluation | 16 |
English (Second Language) | 11 |
Second Language Learning | 11 |
Essays | 10 |
Evaluators | 8 |
Language Tests | 8 |
Writing Tests | 8 |
Foreign Countries | 7 |
Language Proficiency | 6 |
Comparative Analysis | 5 |
More ▼ |
Source
Language Testing | 16 |
Author
Barkaoui, Khaled | 2 |
Gebril, Atta | 2 |
Ann Tai Choe | 1 |
Attali, Yigal | 1 |
Bilki, Zeynep | 1 |
Bond, Trevor | 1 |
Bouwer, Renske | 1 |
Béguin, Anton | 1 |
Chan, Kinnie Kin Yee | 1 |
Crossley, Scott | 1 |
Crossley, Scott A. | 1 |
More ▼ |
Publication Type
Journal Articles | 16 |
Reports - Research | 14 |
Tests/Questionnaires | 2 |
Reports - Evaluative | 1 |
Education Level
Secondary Education | 5 |
Higher Education | 4 |
Elementary Education | 2 |
High Schools | 2 |
Postsecondary Education | 2 |
Adult Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 4 |
What Works Clearinghouse Rating
Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025
Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…
Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction
Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024
In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…
Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)
Ray J. T. Liao; Renka Ohta; Kwangmin Lee – Language Testing, 2024
As integrated writing tasks in large-scale and classroom-based writing assessments have risen in popularity, research studies have increasingly concentrated on providing validity evidence. Given the fact that most of these studies focus on adult second language learners rather than younger ones, this study examined the relationship between written…
Descriptors: Writing (Composition), Writing Evaluation, English Language Learners, Discourse Analysis
Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021
Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…
Descriptors: Scoring, Essays, Writing Evaluation, Computer Software
Chan, Kinnie Kin Yee; Bond, Trevor; Yan, Zi – Language Testing, 2023
We investigated the relationship between the scores assigned by an Automated Essay Scoring (AES) system, the Intelligent Essay Assessor (IEA), and grades allocated by trained, professional human raters to English essay writing by instigating two procedures novel to written-language assessment: the logistic transformation of AES raw scores into…
Descriptors: Computer Assisted Testing, Essays, Scoring, Scores
Taichi Yamashita – Language Testing, 2025
With the rapid development of generative artificial intelligence (AI) frameworks (e.g., the generative pre-trained transformer [GPT]), a growing number of researchers have started to explore its potential as an automated essay scoring (AES) system. While previous studies have investigated the alignment between human ratings and GPT ratings, few…
Descriptors: Artificial Intelligence, English (Second Language), Second Language Learning, Second Language Instruction
Shi, Bibing; Huang, Liyan; Lu, Xiaofei – Language Testing, 2020
The continuation task, a new form of reading-writing integrated task in which test-takers read an incomplete story and then write the continuation and ending of the story, has been increasingly used in writing assessment, especially in China. However, language-test developers' understanding of the effects of important task-related factors on…
Descriptors: Cues, Writing Tests, Writing Evaluation, English (Second Language)
Plakans, Lia; Gebril, Atta; Bilki, Zeynep – Language Testing, 2019
The present study investigates integrated writing assessment performances with regard to the linguistic features of complexity, accuracy, and fluency (CAF). Given the increasing presence of integrated tasks in large-scale and classroom assessments, validity evidence is needed for the claim that their scores reflect targeted language abilities.…
Descriptors: Accuracy, Language Tests, Scores, Writing Evaluation
Sahan, Özgür; Razi, Salim – Language Testing, 2020
This study examines the decision-making behaviors of raters with varying levels of experience while assessing EFL essays of distinct qualities. The data were collected from 28 raters with varying levels of rating experience and working at the English language departments of different universities in Turkey. Using a 10-point analytic rubric, each…
Descriptors: Decision Making, Essays, Writing Evaluation, Evaluators
Attali, Yigal – Language Testing, 2016
A short training program for evaluating responses to an essay writing task consisted of scoring 20 training essays with immediate feedback about the correct score. The same scoring session also served as a certification test for trainees. Participants with little or no previous rating experience completed this session and 14 trainees who passed an…
Descriptors: Writing Evaluation, Writing Tests, Standardized Tests, Evaluators
Kyle, Kristopher; Crossley, Scott – Language Testing, 2017
Over the past 45 years, the construct of syntactic sophistication has been assessed in L2 writing using what Bulté and Housen (2012) refer to as absolute complexity (Lu, 2011; Ortega, 2003; Wolfe-Quintero, Inagaki, & Kim, 1998). However, it has been argued that making inferences about learners based on absolute complexity indices (e.g., mean…
Descriptors: Syntax, Verbs, Second Language Learning, Word Frequency
Bouwer, Renske; Béguin, Anton; Sanders, Ted; van den Bergh, Huub – Language Testing, 2015
In the present study, aspects of the measurement of writing are disentangled in order to investigate the validity of inferences made on the basis of writing performance and to describe implications for the assessment of writing. To include genre as a facet in the measurement, we obtained writing scores of 12 texts in four different genres for each…
Descriptors: Writing Tests, Generalization, Scores, Writing Instruction
Barkaoui, Khaled – Language Testing, 2014
A major concern with computer-based (CB) tests of second-language (L2) writing is that performance on such tests may be influenced by test-taker keyboarding skills. Poor keyboarding skills may force test-takers to focus their attention and cognitive resources on motor activities (i.e., keyboarding) and, consequently, other processes and aspects of…
Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning
Crossley, Scott A.; Salsbury, Tom; McNamara, Danielle S.; Jarvis, Scott – Language Testing, 2011
The authors present a model of lexical proficiency based on lexical indices related to vocabulary size, depth of lexical knowledge, and accessibility to core lexical items. The lexical indices used in this study come from the computational tool Coh-Metrix and include word length scores, lexical diversity values, word frequency counts, hypernymy…
Descriptors: Semantics, Familiarity, Second Language Learning, Word Frequency
Barkaoui, Khaled – Language Testing, 2011
Think-aloud protocols (TAPs) are frequently used in research on essay rating processes. However, there are very few empirical studies of the completeness of TAP data and the effects of this technique on rater performance (i.e., rating processes and outcomes). This study aims to start to address this research gap. As part of a larger study on rater…
Descriptors: Protocol Analysis, Rating Scales, Essays, English (Second Language)
Previous Page | Next Page »
Pages: 1 | 2