Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 6 |
Descriptor
| Performance Based Assessment | 6 |
| Language Tests | 4 |
| Second Language Learning | 4 |
| Evaluators | 3 |
| Accuracy | 2 |
| Bias | 2 |
| English (Second Language) | 2 |
| Foreign Countries | 2 |
| Item Response Theory | 2 |
| Language Proficiency | 2 |
| Reliability | 2 |
| More ▼ | |
Source
| Language Testing | 6 |
Author
| Janssen, Gerriet | 1 |
| Khabbazbashi, Nahal | 1 |
| Lin, Chih-Kai | 1 |
| Meier, Valerie | 1 |
| Morita-Mullaney, Trish | 1 |
| Sun-Young Shin | 1 |
| Trace, Jonathan | 1 |
| Wind, Stefanie A. | 1 |
| Yunwen Su | 1 |
Publication Type
| Journal Articles | 6 |
| Reports - Research | 6 |
| Tests/Questionnaires | 1 |
Education Level
| Elementary Secondary Education | 1 |
| High Schools | 1 |
| Secondary Education | 1 |
Audience
Location
| Colombia | 1 |
| Indiana | 1 |
| Iran (Tehran) | 1 |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
| International English… | 1 |
What Works Clearinghouse Rating
Yunwen Su; Sun-Young Shin – Language Testing, 2024
Rating scales that language testers design should be tailored to the specific test purpose and score use as well as reflect the target construct. Researchers have long argued for the value of data-driven scales for classroom performance assessment, because they are specific to pedagogical tasks and objectives, have rich descriptors to offer useful…
Descriptors: Rating Scales, Language Tests, Test Construction, Performance Based Assessment
Wind, Stefanie A. – Language Testing, 2023
Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting…
Descriptors: Evaluators, Decision Making, Student Characteristics, Performance Based Assessment
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017
Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…
Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators
Khabbazbashi, Nahal – Language Testing, 2017
This study explores the extent to which topic and background knowledge of topic affect spoken performance in a high-stakes speaking test. It is argued that evidence of a substantial influence may introduce construct-irrelevant variance and undermine test fairness. Data were collected from 81 non-native speakers of English who performed on 10…
Descriptors: Speech Tests, High Stakes Tests, English (Second Language), Language Proficiency
Morita-Mullaney, Trish – Language Testing, 2017
English language proficiency or English language development (ELP/D) standards guide how content-specific instruction and assessment is practiced by teachers and how English learners (ELs) at varying levels of English proficiency can perform grade-level-specific academic standards in K-12 US schools. With the transition from the state-developed…
Descriptors: Language Proficiency, English (Second Language), Second Language Learning, Feminism

Peer reviewed
Direct link
