ERIC - Search Results

Publication Date

In 2025	1
Since 2024	4
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	13
Since 2006 (last 20 years)	18

Descriptor

Item Response Theory	18
Language Tests	13
Foreign Countries	12
English (Second Language)	8
Second Language Learning	8
Test Reliability	8
Interrater Reliability	7
Evaluators	6
Rating Scales	6
Scoring	6
Correlation	5
Scores	5
Writing Tests	5
Comparative Analysis	4
Language Proficiency	4
Test Validity	4
Difficulty Level	3
Elementary School Students	3
Placement Tests	3
Reliability	3
Test Items	3
Writing Evaluation	3
College Students	2
Evaluation Criteria	2
Factor Analysis	2
More ▼

Source

Language Testing

Publication Type

Journal Articles	18
Reports - Research	17
Reports - Evaluative	1
Tests/Questionnaires	1

Education Level

Elementary Education	5
Higher Education	3
Postsecondary Education	3
Early Childhood Education	1
Elementary Secondary Education	1
Grade 6	1
Intermediate Grades	1
Kindergarten	1
Primary Education	1
Secondary Education	1

Audience

Location

China	2
Finland	2
Bulgaria	1
Colombia	1
Georgia	1
Germany	1
Hawaii	1
Hong Kong	1
Iran	1
Japan	1
Kenya	1
Netherlands	1
Russia	1
South Korea	1
Switzerland	1
Taiwan	1
Ukraine	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Peabody Picture Vocabulary…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 18 results Save | Export

Communal Factors in Rater Severity and Consistency over Time in High-Stakes Oral Assessment

Peer reviewed

Direct link

Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024

This article focuses on rater severity and consistency and their relation to major changes in the rating system in a high-stakes testing context. The study is based on longitudinal data collected from 2009 to 2019 from the second language (L2) Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. We investigated…

Descriptors: Foreign Countries, Interrater Reliability, Evaluators, Item Response Theory

Triangulating Natural Language Processing (NLP)-Based Analysis of Rater Comments and Many-Facet Rasch Measurement (MFRM): An Innovative Approach to Investigating Raters' Application of Rating Scales in Writing Assessment

Peer reviewed

Direct link

Huiying Cai; Xun Yan – Language Testing, 2024

Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…

Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation

Making Each Point Count: Revising a Local Adaptation of the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE Rubric

Peer reviewed

Direct link

Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024

In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…

Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)

A New Scoring Method for Item Response Theory Analysis of C-Tests

Peer reviewed

Direct link

Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025

This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…

Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction

Operationalizing the Reading-into-Writing Construct in Analytic Rating Scales: Effects of Different Approaches on Rating

Peer reviewed

Direct link

Lestari, Santi B.; Brunfaut, Tineke – Language Testing, 2023

Assessing integrated reading-into-writing task performances is known to be challenging, and analytic rating scales have been found to better facilitate the scoring of these performances than other common types of rating scales. However, little is known about how specific operationalizations of the reading-into-writing construct in analytic rating…

Descriptors: Reading Writing Relationship, Writing Tests, Rating Scales, Writing Processes

"How Do Raters Learn to Rate?" Many-Facet Rasch Modeling of Rater Performance over the Course of a Rater Certification Program

Peer reviewed

Direct link

Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023

This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification

A Nonparametric Procedure for Exploring Differences in Rating Quality across Test-Taker Subgroups in Rater-Mediated Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A. – Language Testing, 2019

Differences in rater judgments that are systematically related to construct-irrelevant characteristics threaten the fairness of rater-mediated writing assessments. Accordingly, it is essential that researchers and practitioners examine the degree to which the psychometric quality of rater judgments is comparable across test-taker subgroups.…

Descriptors: Nonparametric Statistics, Interrater Reliability, Differences, Writing Tests

Validity Evidence for a Sentence Repetition Test of Swiss German Sign Language

Peer reviewed

Direct link

Haug, Tobias; Batty, Aaron Olaf; Venetz, Martin; Notter, Christa; Girard-Groeber, Simone; Knoch, Ute; Audeoud, Mireille – Language Testing, 2020

In this study we seek evidence of validity according to the socio-cognitive framework (Weir, 2005) for a new sentence repetition test (SRT) for young Deaf L1 Swiss German Sign Language (DSGS) users. SRTs have been developed for various purposes for both spoken and sign languages to assess language development in children. In order to address the…

Descriptors: Foreign Countries, Language Tests, Sentences, Repetition

Mapping the Fluctuating Effect of Strategy Use Ability on English Reading Performance for Nursing Students: A Multi-Layered Moderation Analysis Approach

Peer reviewed

Direct link

Cai, Yuyang; Kunnan, Antony John – Language Testing, 2020

An essential hypothesis of modern language assessment theory pertains to the interaction between strategy use ability (strategic competence) and second language knowledge. However, how they interact with each other is rarely explored. Drawing on relevant research in the literature, in this paper we proposed three interaction patterns (i.e.,…

Descriptors: English (Second Language), Second Language Learning, Nursing Education, Reading Tests

Development and Validation of a Chinese Character Acquisition Assessment for Second-Language Kindergarteners

Peer reviewed

Direct link

Chan, Stephanie W. Y.; Cheung, Wai Ming; Huang, Yanli; Lam, Wai-Ip; Lin, Chin-Hsi – Language Testing, 2020

Demand for second-language (L2) Chinese education for kindergarteners has grown rapidly, but little is known about these kindergarteners' L2 skills, with existing studies focusing on school-age populations and alphabetic languages. Accordingly, we developed a six-subtest Chinese character acquisition assessment to measure L2 kindergarteners'…

Descriptors: Chinese, Second Language Learning, Second Language Instruction, Written Language

Evaluating Subscore Uses across Multiple Levels: A Case of Reading and Listening Subscores for Young EFL Learners

Peer reviewed

Direct link

Choi, Ikkyu; Papageorgiou, Spiros – Language Testing, 2020

Stakeholders of language tests are often interested in subscores. However, reporting a subscore is not always justified; a subscore should provide reliable and distinct information to be worth reporting. When a subscore is used for decisions across multiple levels (e.g., individual test takers and schools), it needs to be justified for its…

Descriptors: English (Second Language), Language Tests, Second Language Learning, Scores

A Comparison of Reliability and Precision of Subscore Reporting Methods for a State English Language Proficiency Assessment

Peer reviewed

Direct link

Longabach, Tanya; Peyton, Vicki – Language Testing, 2018

K-12 English language proficiency tests that assess multiple content domains (e.g., listening, speaking, reading, writing) often have subsections based on these content domains; scores assigned to these subsections are commonly known as subscores. Testing programs face increasing customer demands for the reporting of subscores in addition to the…

Descriptors: Comparative Analysis, Test Reliability, Second Language Learning, Language Proficiency

Measuring the Impact of Rater Negotiation in Writing Performance Assessment

Peer reviewed

Direct link

Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017

Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…

Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators

A Comparison of Three Test Formats to Assess Word Difficulty

Peer reviewed

Direct link

Culligan, Brent – Language Testing, 2015

This study compared three common vocabulary test formats, the Yes/No test, the Vocabulary Knowledge Scale (VKS), and the Vocabulary Levels Test (VLT), as measures of vocabulary difficulty. Vocabulary difficulty was defined as the item difficulty estimated through Item Response Theory (IRT) analysis. Three tests were given to 165 Japanese students,…

Descriptors: Language Tests, Test Format, Comparative Analysis, Vocabulary

Determining the Scoring Validity of a Co-Constructed CEFR-Based Rating Scale

Peer reviewed

Direct link

Deygers, Bart; Van Gorp, Koen – Language Testing, 2015

Considering scoring validity as encompassing both reliable rating scale use and valid descriptor interpretation, this study reports on the validation of a CEFR-based scale that was co-constructed and used by novice raters. The research questions this paper wishes to answer are (a) whether it is possible to construct a CEFR-based rating scale with…

Descriptors: Rating Scales, Scoring, Validity, Interrater Reliability

Previous Page | Next Page »

Pages: 1 | 2

Alanen, Riikka	1
Ann Tai Choe	1
Audeoud, Mireille	1
Batty, Aaron Olaf	1
Brunfaut, Tineke	1
Cai, Yuyang	1
Chan, Stephanie W. Y.	1
Cheung, Wai Ming	1
Choi, Ikkyu	1
Chuang, Ping-Lin	1
Culligan, Brent	1
Daniel Holden	1
Daniel R. Isbell	1
Deygers, Bart	1
Eckes, Thomas	1
Esmat Babaii	1
Farshad Effatpanah	1
Girard-Groeber, Simone	1
Haug, Tobias	1
Hirvelä, Tuija	1
Hsieh, Mingchuan	1
Huang, Yanli	1
Huhta, Ari	1
Huiying Cai	1
Iasonas Lamprianou	1
More ▼