Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 8 |
Since 2006 (last 20 years) | 16 |
Descriptor
Performance Based Assessment | 21 |
Language Tests | 14 |
Second Language Learning | 14 |
English (Second Language) | 10 |
Evaluators | 8 |
Foreign Countries | 6 |
Performance Tests | 6 |
Scoring | 5 |
Writing Tests | 5 |
Language Proficiency | 4 |
Scores | 4 |
More ▼ |
Source
Language Testing | 21 |
Author
Lim, Gad S. | 2 |
Xi, Xiaoming | 2 |
Al-Hamly, Mashael | 1 |
Barkaoui, Khaled | 1 |
Brindley, Geoff | 1 |
Coombe, Christine | 1 |
Eckes, Thomas | 1 |
Hoekje, Barbara | 1 |
Huang, Shu-Chen | 1 |
Janssen, Gerriet | 1 |
Johnson, Jeff S. | 1 |
More ▼ |
Publication Type
Journal Articles | 21 |
Reports - Research | 15 |
Reports - Evaluative | 5 |
Opinion Papers | 1 |
Reports - Descriptive | 1 |
Tests/Questionnaires | 1 |
Audience
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
International English… | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Yunwen Su; Sun-Young Shin – Language Testing, 2024
Rating scales that language testers design should be tailored to the specific test purpose and score use as well as reflect the target construct. Researchers have long argued for the value of data-driven scales for classroom performance assessment, because they are specific to pedagogical tasks and objectives, have rich descriptors to offer useful…
Descriptors: Rating Scales, Language Tests, Test Construction, Performance Based Assessment
Wind, Stefanie A. – Language Testing, 2023
Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting…
Descriptors: Evaluators, Decision Making, Student Characteristics, Performance Based Assessment
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017
Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…
Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators
Khabbazbashi, Nahal – Language Testing, 2017
This study explores the extent to which topic and background knowledge of topic affect spoken performance in a high-stakes speaking test. It is argued that evidence of a substantial influence may introduce construct-irrelevant variance and undermine test fairness. Data were collected from 81 non-native speakers of English who performed on 10…
Descriptors: Speech Tests, High Stakes Tests, English (Second Language), Language Proficiency
Hoekje, Barbara – Language Testing, 2016
This commentary argues that the OET research raises inescapable contradictions in trying to separate "language" from "communication" within a weak performance test and advocates for reconceptualizing the legitimate domain of "language" more widely, reclaiming the full potential of the communicative competence…
Descriptors: Language Tests, Languages for Special Purposes, Second Language Learning, Communicative Competence (Languages)
Morita-Mullaney, Trish – Language Testing, 2017
English language proficiency or English language development (ELP/D) standards guide how content-specific instruction and assessment is practiced by teachers and how English learners (ELs) at varying levels of English proficiency can perform grade-level-specific academic standards in K-12 US schools. With the transition from the state-developed…
Descriptors: Language Proficiency, English (Second Language), Second Language Learning, Feminism
Manias, Elizabeth; McNamara, Tim – Language Testing, 2016
This paper explores the views of nursing and medical domain experts in considering the standards for a specific-purpose English language screening test, the Occupational English Test (OET), for professional registration for immigrant health professionals. Since individuals who score performances in the test setting are often language experts…
Descriptors: Standard Setting, Academic Standards, English for Special Purposes, Language Tests
Lim, Gad S. – Language Testing, 2011
Raters are central to writing performance assessment, and rater development--training, experience, and expertise--involves a temporal dimension. However, few studies have examined new and experienced raters' rating performance longitudinally over multiple time points. This study uses operational data from the writing section of the MELAB (n =…
Descriptors: Expertise, Writing Evaluation, Performance Based Assessment, Writing Tests
Barkaoui, Khaled – Language Testing, 2010
This study adopted a multilevel modeling (MLM) approach to examine the contribution of rater and essay factors to variability in ESL essay holistic scores. Previous research aiming to explain variability in essay holistic scores has focused on either rater or essay factors. The few studies that have examined the contribution of more than one…
Descriptors: Performance Based Assessment, English (Second Language), Second Language Learning, Holistic Approach
Huang, Shu-Chen – Language Testing, 2011
This study examined two types of classroom assessment events, the more closed convergent assessments (CA) versus the more open-ended divergent assessments (DA), to see if they influence learners differently in terms of motivation and learning strategies. Participants were 105 college freshmen in Taiwan with the same instructor placed under one…
Descriptors: College Freshmen, Speech Communication, Self Efficacy, Performance Based Assessment
Kim, Youn-Hee – Language Testing, 2011
Despite the increasing interest in and need for test information for use in instructional practice and student learning, there have been few attempts to systematically link a diagnostic approach to English for academic purposes (EAP) writing instruction and assessment. In response to this need for research, this study examined the extent to which…
Descriptors: Performance Based Assessment, Performance Tests, Diagnostic Tests, Discriminant Analysis
Johnson, Jeff S.; Lim, Gad S. – Language Testing, 2009
Language performance assessments typically require human raters, introducing possible error. In international examinations of English proficiency, rater language background is an especially salient factor that needs to be considered. The existence of rater language background-related bias in writing performance assessment is the object of this…
Descriptors: Performance Based Assessment, Performance Tests, Native Speakers, English (Second Language)
Eckes, Thomas – Language Testing, 2008
Research on rater effects in language performance assessments has provided ample evidence for a considerable degree of variability among raters. Building on this research, I advance the hypothesis that experienced raters fall into types or classes that are clearly distinguishable from one another with respect to the importance they attach to…
Descriptors: Performance Based Assessment, Language Tests, Measures (Individuals), Scoring
Knoch, Ute – Language Testing, 2009
Alderson (2005) suggests that diagnostic tests should identify strengths and weaknesses in learners' use of language and focus on specific elements rather than global abilities. However, rating scales used in performance assessment have been repeatedly criticized for being imprecise and therefore often resulting in holistic marking by raters…
Descriptors: Feedback (Response), Language Usage, Performance Based Assessment, Performance Tests
Previous Page | Next Page ยป
Pages: 1 | 2