Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 7 |
Descriptor
Interrater Reliability | 8 |
Scoring | 8 |
Language Tests | 6 |
Correlation | 3 |
Foreign Countries | 3 |
Scores | 3 |
Automation | 2 |
Data Analysis | 2 |
Error of Measurement | 2 |
Evaluators | 2 |
Item Response Theory | 2 |
More ▼ |
Source
Language Testing | 8 |
Author
Attali, Yigal | 1 |
Carey, Michael D. | 1 |
Chan, Stephanie W. Y. | 1 |
Cheung, Wai Ming | 1 |
Deygers, Bart | 1 |
Dunn, Peter K. | 1 |
Huang, Yanli | 1 |
Katzenberger, Irit | 1 |
Lam, Wai-Ip | 1 |
Lewis, Will | 1 |
Lin, Chih-Kai | 1 |
More ▼ |
Publication Type
Journal Articles | 8 |
Reports - Research | 7 |
Reports - Evaluative | 1 |
Education Level
Early Childhood Education | 1 |
Elementary Education | 1 |
Higher Education | 1 |
Kindergarten | 1 |
Postsecondary Education | 1 |
Primary Education | 1 |
Audience
Location
Netherlands | 2 |
China | 1 |
Hong Kong | 1 |
India | 1 |
South Korea | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Graduate Record Examinations | 1 |
Peabody Picture Vocabulary… | 1 |
What Works Clearinghouse Rating
Wang, Zhen; Zechner, Klaus; Sun, Yu – Language Testing, 2018
As automated scoring systems for spoken responses are increasingly used in language assessments, testing organizations need to analyze their performance, as compared to human raters, across several dimensions, for example, on individual items or based on subgroups of test takers. In addition, there is a need in testing organizations to establish…
Descriptors: Automation, Scoring, Speech Tests, Language Tests
Chan, Stephanie W. Y.; Cheung, Wai Ming; Huang, Yanli; Lam, Wai-Ip; Lin, Chin-Hsi – Language Testing, 2020
Demand for second-language (L2) Chinese education for kindergarteners has grown rapidly, but little is known about these kindergarteners' L2 skills, with existing studies focusing on school-age populations and alphabetic languages. Accordingly, we developed a six-subtest Chinese character acquisition assessment to measure L2 kindergarteners'…
Descriptors: Chinese, Second Language Learning, Second Language Instruction, Written Language
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Attali, Yigal; Lewis, Will; Steier, Michael – Language Testing, 2013
Automated essay scoring can produce reliable scores that are highly correlated with human scores, but is limited in its evaluation of content and other higher-order aspects of writing. The increased use of automated essay scoring in high-stakes testing underscores the need for human scoring that is focused on higher-order aspects of writing. This…
Descriptors: Scoring, Essay Tests, Reliability, High Stakes Tests
Deygers, Bart; Van Gorp, Koen – Language Testing, 2015
Considering scoring validity as encompassing both reliable rating scale use and valid descriptor interpretation, this study reports on the validation of a CEFR-based scale that was co-constructed and used by novice raters. The research questions this paper wishes to answer are (a) whether it is possible to construct a CEFR-based rating scale with…
Descriptors: Rating Scales, Scoring, Validity, Interrater Reliability
Katzenberger, Irit; Meilijson, Sara – Language Testing, 2014
The Katzenberger Hebrew Language Assessment for Preschool Children (henceforth: the KHLA) is the first comprehensive, standardized language assessment tool developed in Hebrew specifically for older preschoolers (4;0-5;11 years). The KHLA is a norm-referenced, Hebrew specific assessment, based on well-established psycholinguistic principles, as…
Descriptors: Semitic Languages, Preschool Children, Language Impairments, Language Tests
Carey, Michael D.; Mannell, Robert H.; Dunn, Peter K. – Language Testing, 2011
This study investigated factors that could affect inter-examiner reliability in the pronunciation assessment component of speaking tests. We hypothesized that the rating of pronunciation is susceptible to variation in assessment due to the amount of exposure examiners have to nonnative English accents. An inter-rater variability analysis was…
Descriptors: Oral Language, Pronunciation, Phonology, Interlanguage

Schoonen, Rob; And Others – Language Testing, 1997
Reports on three studies conducted in the Netherlands about the reading reliability of lay and expert readers in rating content and language usage of students' writing performances in three kinds of writing assignments. Findings reveal that expert readers are more reliable in rating usage, whereas both lay and expert readers are reliable raters of…
Descriptors: Foreign Countries, Interrater Reliability, Language Usage, Models