ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	5

Descriptor

Interrater Reliability	5
Scores	5
Language Tests	4
Evaluators	3
Scoring	3
Accuracy	2
Comparative Analysis	2
Correlation	2
Generalizability Theory	2
Training	2
Auditory Perception	1
Automation	1
Certification	1
Classification	1
Computer Assisted Testing	1
Cutting Scores	1
Data Analysis	1
English (Second Language)	1
Error of Measurement	1
Essays	1
Evaluation Methods	1
Expertise	1
Factor Analysis	1
Factor Structure	1
Feedback (Response)	1
More ▼

Source

Language Testing

Author

Attali, Yigal	1
Davis, Larry	1
Katzenberger, Irit	1
Lin, Chih-Kai	1
Meilijson, Sara	1
Sun, Yu	1
Wang, Zhen	1
Zechner, Klaus	1

Publication Type

Journal Articles	5
Reports - Research	5

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing all 5 results Save | Export

Monitoring the Performance of Human and Automated Scores for Spoken Responses

Peer reviewed

Direct link

Wang, Zhen; Zechner, Klaus; Sun, Yu – Language Testing, 2018

As automated scoring systems for spoken responses are increasingly used in language assessments, testing organizations need to analyze their performance, as compared to human raters, across several dimensions, for example, on individual items or based on subgroups of test takers. In addition, there is a need in testing organizations to establish…

Descriptors: Automation, Scoring, Speech Tests, Language Tests

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

The Influence of Training and Experience on Rater Performance in Scoring Spoken Language

Peer reviewed

Direct link

Davis, Larry – Language Testing, 2016

Two factors were investigated that are thought to contribute to consistency in rater scoring judgments: rater training and experience in scoring. Also considered were the relative effects of scoring rubrics and exemplars on rater performance. Experienced teachers of English (N = 20) scored recorded responses from the TOEFL iBT speaking test prior…

Descriptors: Evaluators, Oral Language, Scores, Language Tests

A Comparison of Newly-Trained and Experienced Raters on a Standardized Writing Assessment

Peer reviewed

Direct link

Attali, Yigal – Language Testing, 2016

A short training program for evaluating responses to an essay writing task consisted of scoring 20 training essays with immediate feedback about the correct score. The same scoring session also served as a certification test for trainees. Participants with little or no previous rating experience completed this session and 14 trainees who passed an…

Descriptors: Writing Evaluation, Writing Tests, Standardized Tests, Evaluators

Hebrew Language Assessment Measure for Preschool Children: A Comparison between Typically Developing Children and Children with Specific Language Impairment

Peer reviewed

Direct link

Katzenberger, Irit; Meilijson, Sara – Language Testing, 2014

The Katzenberger Hebrew Language Assessment for Preschool Children (henceforth: the KHLA) is the first comprehensive, standardized language assessment tool developed in Hebrew specifically for older preschoolers (4;0-5;11 years). The KHLA is a norm-referenced, Hebrew specific assessment, based on well-established psycholinguistic principles, as…

Descriptors: Semitic Languages, Preschool Children, Language Impairments, Language Tests