ERIC - Search Results

Publication Date

In 2025	1
Since 2024	8
Since 2021 (last 5 years)	22
Since 2016 (last 10 years)	43
Since 2006 (last 20 years)	72

Descriptor

Evaluators	76
Language Tests	58
Second Language Learning	58
English (Second Language)	42
Language Proficiency	29
Oral Language	24
Scores	23
Foreign Countries	22
Writing Evaluation	21
Scoring	20
Second Language Instruction	17
Rating Scales	16
Interrater Reliability	15
Correlation	13
Writing Tests	13
Evaluation Criteria	12
Testing	12
Comparative Analysis	11
Essays	11
Speech Communication	11
Evaluation Methods	10
Language Teachers	10
Accuracy	9
Item Response Theory	9
Statistical Analysis	9
More ▼

Source

Language Testing

Publication Type

Journal Articles	76
Reports - Research	66
Tests/Questionnaires	7
Reports - Descriptive	4
Reports - Evaluative	4
Information Analyses	2
Opinion Papers	1
Speeches/Meeting Papers	1

Education Level

Higher Education	23
Postsecondary Education	16
Secondary Education	6
Elementary Education	2
Elementary Secondary Education	2
Adult Education	1
High Schools	1

Audience

Location

China	6
Australia	3
Europe	3
Netherlands	3
India	2
Turkey	2
California (San Francisco)	1
Canada	1
Colombia	1
Finland	1
Hawaii	1
Illinois (Urbana)	1
Japan	1
Michigan	1
New York (New York)	1
Ohio	1
South Korea	1
Switzerland	1
United States	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	8
International English…	3
ACTFL Oral Proficiency…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 76 results Save | Export

Communal Factors in Rater Severity and Consistency over Time in High-Stakes Oral Assessment

Peer reviewed

Direct link

Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024

This article focuses on rater severity and consistency and their relation to major changes in the rating system in a high-stakes testing context. The study is based on longitudinal data collected from 2009 to 2019 from the second language (L2) Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. We investigated…

Descriptors: Foreign Countries, Interrater Reliability, Evaluators, Item Response Theory

Administration, Labor, and Love

Peer reviewed

Direct link

Ginther, April – Language Testing, 2023

Great opportunities for language testing practitioners are enabled through language program administration. Local language tests lend themselves to multiple purposes--for placement and diagnosis, as a means of tracking progress, and as a contribution to program evaluation and revision. Administrative choices, especially those involving a test, are…

Descriptors: Language Tests, Testing, Examiners, Placement Tests

Language Testers and Their Place in the Policy Web

Peer reviewed

Direct link

Laura Schildt; Bart Deygers; Albert Weideman – Language Testing, 2024

In the context of policy-driven language testing for citizenship, a growing body of research examines the political justifications and ethical implications of language requirements and test use. However, virtually no studies have looked at the role that language testers play in the evolution of language requirements. Critical gaps remain in our…

Descriptors: Language Tests, Citizenship, Educational Policy, Assessment Literacy

Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Peer reviewed

Direct link

Ping-Lin Chuang – Language Testing, 2025

This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…

Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources

Triangulating Natural Language Processing (NLP)-Based Analysis of Rater Comments and Many-Facet Rasch Measurement (MFRM): An Innovative Approach to Investigating Raters' Application of Rating Scales in Writing Assessment

Peer reviewed

Direct link

Huiying Cai; Xun Yan – Language Testing, 2024

Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…

Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation

Revisiting Raters' Accent Familiarity in Speaking Tests: Evidence That Presentation Mode Interacts with Accent Familiarity to Variably Affect Comprehensibility Ratings

Peer reviewed

Direct link

Michael D. Carey; Stefan Szocs – Language Testing, 2024

This controlled experimental study investigated the interaction of variables associated with rating the pronunciation component of high-stakes English-language-speaking tests such as IELTS and TOEFL iBT. One hundred experienced raters who were all either familiar or unfamiliar with Brazilian-accented English or Papua New Guinean Tok Pisin-accented…

Descriptors: Dialects, Pronunciation, Suprasegmentals, Familiarity

Assessing the Content Quality of Essays in Content and Language Integrated Learning: Exploring the Construct from Subject Specialists' Perspectives

Peer reviewed

Direct link

Takanori Sato – Language Testing, 2024

Assessing the content of learners' compositions is a common practice in second language (L2) writing assessment. However, the construct definition of content in L2 writing assessment potentially underrepresents the target competence in content and language integrated learning (CLIL), which aims to foster not only L2 proficiency but also critical…

Descriptors: Language Tests, Content and Language Integrated Learning, Writing Evaluation, Writing Tests

Making Each Point Count: Revising a Local Adaptation of the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE Rubric

Peer reviewed

Direct link

Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024

In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…

Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)

Operationalizing the Reading-into-Writing Construct in Analytic Rating Scales: Effects of Different Approaches on Rating

Peer reviewed

Direct link

Lestari, Santi B.; Brunfaut, Tineke – Language Testing, 2023

Assessing integrated reading-into-writing task performances is known to be challenging, and analytic rating scales have been found to better facilitate the scoring of these performances than other common types of rating scales. However, little is known about how specific operationalizations of the reading-into-writing construct in analytic rating…

Descriptors: Reading Writing Relationship, Writing Tests, Rating Scales, Writing Processes

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

"How Do Raters Learn to Rate?" Many-Facet Rasch Modeling of Rater Performance over the Course of a Rater Certification Program

Peer reviewed

Direct link

Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023

This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification

Application of an Automated Essay Scoring Engine to English Writing Assessment Using Many-Facet Rasch Measurement

Peer reviewed

Direct link

Chan, Kinnie Kin Yee; Bond, Trevor; Yan, Zi – Language Testing, 2023

We investigated the relationship between the scores assigned by an Automated Essay Scoring (AES) system, the Intelligent Essay Assessor (IEA), and grades allocated by trained, professional human raters to English essay writing by instigating two procedures novel to written-language assessment: the logistic transformation of AES raw scores into…

Descriptors: Computer Assisted Testing, Essays, Scoring, Scores

Towards More Valid Scoring Criteria for Integrated Reading-Writing and Listening-Writing Summary Tasks

Peer reviewed

Direct link

Chan, Sathena; May, Lyn – Language Testing, 2023

Despite the increased use of integrated tasks in high-stakes academic writing assessment, research on rating criteria which reflect the unique construct of integrated summary writing skills is comparatively rare. Using a mixed-method approach of expert judgement, text analysis, and statistical analysis, this study examines writing features that…

Descriptors: Scoring, Writing Evaluation, Reading Tests, Listening Skills

Challenges in Rating Signed Production: A Mixed-Methods Study of a Swiss German Sign Language Form-Recall Vocabulary Test

Peer reviewed

Direct link

Batty, Aaron Olaf; Haug, Tobias; Ebling, Sarah; Tissi, Katja; Sidler-Miserez, Sandra – Language Testing, 2023

Sign languages present particular challenges to language assessors in relation to variation in signs, weakly defined citation forms, and a general lack of standard-setting work even in long-established measures of productive sign proficiency. The present article addresses and explores these issues via a mixed-methods study of a human-rated…

Descriptors: Sign Language, Language Tests, Standard Setting, Barriers

A Sequential Approach to Detecting Differential Rater Functioning in Sparse Rater-Mediated Assessment Networks

Peer reviewed

Direct link

Wind, Stefanie A. – Language Testing, 2023

Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting…

Descriptors: Evaluators, Decision Making, Student Characteristics, Performance Based Assessment

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

Pill, John	4
May, Lyn	3
Yan, Xun	3
Barkaoui, Khaled	2
Elder, Catherine	2
Han, Chao	2
Kuiken, Folkert	2
Lim, Gad S.	2
Lin, Chih-Kai	2
Mollaun, Pamela	2
Sanders, Ted	2
Vedder, Ineke	2
Wind, Stefanie A.	2
Xi, Xiaoming	2
Zhang, Ying	2
van den Bergh, Huub	2
Albert Weideman	1
Ann Tai Choe	1
Attali, Yigal	1
Bachman, Lyle F.	1
Barkhuizen, Gary	1
Bart Deygers	1
Batty, Aaron Olaf	1
Bond, Trevor	1
Bouwer, Renske	1
More ▼