Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 8 |
Since 2016 (last 10 years) | 15 |
Since 2006 (last 20 years) | 19 |
Descriptor
Evaluation Methods | 24 |
Evaluators | 24 |
Language Tests | 24 |
English (Second Language) | 11 |
Interrater Reliability | 10 |
Second Language Learning | 10 |
Foreign Countries | 8 |
Scoring | 7 |
Comparative Analysis | 6 |
Computer Software | 6 |
Language Proficiency | 6 |
More ▼ |
Source
Author
Bejar, Isaac I. | 2 |
Ahmadi Safa, Mohammad | 1 |
Akbari, Alireza | 1 |
Allison, Desmond | 1 |
Bijani, Houman | 1 |
Bridgeman, Brent | 1 |
Davey, Tim | 1 |
Dimova, Slobodanka | 1 |
Doosti, Mehdi | 1 |
Gaillard, Stéphanie | 1 |
Gupta, Anthea Fraser | 1 |
More ▼ |
Publication Type
Journal Articles | 21 |
Reports - Research | 20 |
Tests/Questionnaires | 3 |
Reports - Evaluative | 2 |
Speeches/Meeting Papers | 2 |
Information Analyses | 1 |
Opinion Papers | 1 |
Education Level
Higher Education | 4 |
Postsecondary Education | 4 |
High Schools | 1 |
Secondary Education | 1 |
Audience
Location
China | 3 |
Europe | 1 |
Italy | 1 |
Netherlands | 1 |
Vietnam | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 2 |
What Works Clearinghouse Rating
Bijani, Houman; Hashempour, Bahareh; Ibrahim, Khaled Ahmed Abdel-Al; Orabah, Salim Said Bani; Heydarnejad, Tahereh – Language Testing in Asia, 2022
Due to subjectivity in oral assessment, much concentration has been put on obtaining a satisfactory measure of consistency among raters. However, the process for obtaining more consistency might not result in valid decisions. One matter that is at the core of both reliability and validity in oral assessment is rater training. Recently,…
Descriptors: Oral Language, Language Tests, Feedback (Response), Bias
Huiying Cai; Xun Yan – Language Testing, 2024
Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…
Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation
Jia, Wenfeng; Zhang, Peixin – Language Testing in Asia, 2023
It is widely believed that raters' cognition is an important aspect of writing assessment, as it has both logical and temporal priority over scores. Based on a critical review of previous research in this area, it is found that raters' cognition can be boiled to two fundamental issues: building text images and strategies for articulating scores.…
Descriptors: Problem Solving, Cognitive Processes, Writing Evaluation, Evaluators
Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022
The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…
Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability
Doosti, Mehdi; Ahmadi Safa, Mohammad – International Journal of Language Testing, 2021
This study examined the effect of rater training on promoting inter-rater reliability in oral language assessment. It also investigated whether rater training and the consideration of the examinees' expectations by the examiners have any effect on test-takers' perceptions of being fairly evaluated. To this end, four raters scored 31 Iranian…
Descriptors: Oral Language, Language Tests, Interrater Reliability, Training
Akbari, Alireza; Shahnazari, Mohammadtaghi – Language Testing in Asia, 2019
The present research paper introduces a translation evaluation method called Calibrated Parsing Items Evaluation (CPIE hereafter). This evaluation method maximizes translators' performance through identifying the parsing items with an optimal p-docimology and d-index (item discrimination). This method checks all the possible parses (annotations)…
Descriptors: Test Items, Translation, Computer Software, Evaluators
Dimova, Slobodanka – Language Teaching Research Quarterly, 2022
Drawing on Glenn Fulcher's extensive work in performance-based language assessment of speaking, this paper explores the assessment of L2 speaking ability in local language testing contexts. For that purpose, I review Fulcher's influential work that highlights the relationship between the speaking construct, the task, the performance, and the…
Descriptors: Language Tests, Speech Communication, Performance Based Assessment, Second Language Learning
Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018
The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…
Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability
Han, Chao; Lu, Xiaolei – Computer Assisted Language Learning, 2023
The use of translation and interpreting (T&I) in the language learning classroom is commonplace, serving various pedagogical and assessment purposes. Previous utilization of T&I exercises is driven largely by their potential to enhance language learning, whereas the latest trend has begun to underscore T&I as a crucial skill to be…
Descriptors: Translation, Computational Linguistics, Correlation, Language Processing
Thai, Thuy; Sheehan, Susan – Language Education & Assessment, 2022
In language performance tests, raters are important as their scoring decisions determine which aspects of performance the scores represent; however, raters are considered as one of the potential sources contributing to unwanted variability in scores (Davis, 2012). Although a great number of studies have been conducted to unpack how rater…
Descriptors: Rating Scales, Speech Communication, Second Language Learning, Second Language Instruction
Lamprianou, Iasonas – Educational and Psychological Measurement, 2018
It is common practice for assessment programs to organize qualifying sessions during which the raters (often known as "markers" or "judges") demonstrate their consistency before operational rating commences. Because of the high-stakes nature of many rating activities, the research community tends to continuously explore new…
Descriptors: Social Networks, Network Analysis, Comparative Analysis, Innovation
Linlin, Cao – English Language Teaching, 2020
Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…
Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Hurtado Albir, Amparo; Pavani, Stefano – Interpreter and Translator Trainer, 2018
The purpose of this paper is to present an experimental study on summative assessment in translation teaching. The paper emphasises the need to use, as an alternative to traditional assessment (i.e. the translation of a text under exam conditions), multidimensional assessment based on a range of criterion-referenced and competence-based assessment…
Descriptors: Translation, Summative Evaluation, Language Tests, Comparative Analysis
Gaillard, Stéphanie; Tremblay, Annie – Language Learning, 2016
This study investigated the elicited imitation task (EIT) as a tool for measuring linguistic proficiency in a second/foreign (L2) language, focusing on French. Nonnative French speakers (n = 94) and native French speakers (n = 6) completed an EIT that included 50 sentences varying in length and complexity. Three raters evaluated productions on…
Descriptors: Language Proficiency, Cloze Procedure, Questionnaires, Language Tests
Previous Page | Next Page »
Pages: 1 | 2